[jira] [Updated] (SPARK-28680) redundant code, or possible bug in Partitioner that could mess up check against spark.default.parallelism

Yuming Wang (JIRA) Fri, 09 Aug 2019 18:27:37 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-28680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuming Wang updated SPARK-28680:
--------------------------------
    Description: 
This is a suggestion to reduce (what I think is) some code redundancy.
Looking at this line of code in org.apache.spark.Partitioner:
https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83

the first part of the && in the if condition is true if hasMaxPartitioner is 
non empty, which means that after a scan of rdds we found one with a 
partitioner whose # of partitions was > 0, and hasMaxPartitioner is the Option 
wrapped RDD which has the 
partitioner with greatest number of partitions.

We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where we 
set maxPartitions = the length of the longest partitioner in rdds and then check
to see if 
 log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1

It seems to me that the values inside the two calls to log10 will be equal, so 
subtracting these will result in 0, which is always < 1.

So... isn't this whole block of code redundant ?

It might even be a bug because the right hand side of the && condition after is 
always 
true, so we never check that 
 defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions

> redundant code, or possible bug in Partitioner that could mess up check 
> against spark.default.parallelism
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28680
>                 URL: https://issues.apache.org/jira/browse/SPARK-28680
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.3
>         Environment: This is a suggestion to reduce (what I think is) some 
> code redundancy.
> Looking at this line of code in org.apache.spark.Partitioner:
> https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83
> the first part of the && in the if condition is true if hasMaxPartitioner is 
> non empty, which means that after a scan of rdds we found one with a 
> partitioner whose # of partitions was > 0, and hasMaxPartitioner is the 
> Option wrapped RDD which has the 
> partitioner with greatest number of partitions.
> We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where 
> we 
> set maxPartitions = the length of the longest partitioner in rdds and then 
> check
> to see if 
>  log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1
> It seems to me that the values inside the two calls to log10 will be equal, 
> so subtracting these will result in 0, which is always < 1.
> So... isn't this whole block of code redundant ?
> It might even be a bug because the right hand side of the && condition after 
> is always 
> true, so we never check that 
>  defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions
>            Reporter: Chris Bedford
>            Priority: Minor
>
> This is a suggestion to reduce (what I think is) some code redundancy.
> Looking at this line of code in org.apache.spark.Partitioner:
> https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83
> the first part of the && in the if condition is true if hasMaxPartitioner is 
> non empty, which means that after a scan of rdds we found one with a 
> partitioner whose # of partitions was > 0, and hasMaxPartitioner is the 
> Option wrapped RDD which has the 
> partitioner with greatest number of partitions.
> We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where 
> we 
> set maxPartitions = the length of the longest partitioner in rdds and then 
> check
> to see if 
>  log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1
> It seems to me that the values inside the two calls to log10 will be equal, 
> so subtracting these will result in 0, which is always < 1.
> So... isn't this whole block of code redundant ?
> It might even be a bug because the right hand side of the && condition after 
> is always 
> true, so we never check that 
>  defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-28680) redundant code, or possible bug in Partitioner that could mess up check against spark.default.parallelism

Reply via email to