[jira] [Commented] (SPARK-26297) improve the doc of Distribution/Partitioning

ASF GitHub Bot (JIRA) Tue, 11 Dec 2018 11:33:56 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717855#comment-16717855
 ]


ASF GitHub Bot commented on SPARK-26297:
----------------------------------------

maryannxue commented on a change in pull request #23249: [SPARK-26297][SQL] 
improve the doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#discussion_r240760075
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ##########
 @@ -262,6 +260,15 @@ case class RangePartitioning(ordering: Seq[SortOrder], 
numPartitions: Int)
     super.satisfies0(required) || {
       required match {
         case OrderedDistribution(requiredOrdering) =>
+          // If `ordering` is a prefix of `requiredOrdering`:
+          //   - Let's say `ordering` is [a, b] and `requiredOrdering` is [a, 
b, c]. If a row is
+          //     larger than another row w.r.t. [a, b], it's also larger 
w.r.t. [a, b, c]. So
+          //     `RangePartitioning(a, b)` satisfies `OrderedDistribution(a, 
b, c)`.
+          //
+          // If `requiredOrdering` is a prefix of `ordering`:
+          //   - Let's say `ordering` is [a, b, c] and `requiredOrdering` is 
[a, b]. If a row is
 
 Review comment:
   "If a row is ... satisfies ..." => According to the RangePartitioning 
definition, any [a1, b1, c1] in a previous partition must be smaller than any 
[a2, b2, c2] in the following partition, which means any row in the previous 
partition must have either 1) [a1, b1] smaller than [a2, b2]; or 2) [a1, b1] 
equal to [a2, b2] and c1 smaller than c2. So `RangePartitioning(a, b, c)` 
satisfies `OrderedDistribution(a, b)` which requires any [a1, b1] from a 
previous partition smaller than any [a2, b2] from a following partition."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> improve the doc of Distribution/Partitioning
> --------------------------------------------
>
>                 Key: SPARK-26297
>                 URL: https://issues.apache.org/jira/browse/SPARK-26297
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26297) improve the doc of Distribution/Partitioning

Reply via email to