Chris Bedford created SPARK-28680:
-------------------------------------
Summary: redundant code, or possible bug in Partitioner that could
mess up check against spark.default.parallelism
Key: SPARK-28680
URL: https://issues.apache.org/jira/browse/SPARK-28680
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.4.3
Environment: This is a suggestion to reduce (what I think is) some
code redundancy.
Looking at this line of code in org.apache.spark.Partitioner:
https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83
the first part of the && in the if condition is true if hasMaxPartitioner is
non empty, which means that after a scan of rdds we found one with a
partitioner whose # of partitions was > 0, and hasMaxPartitioner is the Option
wrapped RDD which has the
partitioner with greatest number of partitions.
We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where we
set maxPartitions = the length of the longest partitioner in rdds and then check
to see if
log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1
It seems to me that the values inside the two calls to log10 will be equal, so
subtracting these will result in 0, which is always < 1.
So... isn't this whole block of code redundant ?
It might even be a bug because the right hand side of the && condition after is
always
true, so we never check that
defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions
Reporter: Chris Bedford
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]