[GitHub] [hudi] vinothchandar commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

GitBox Wed, 24 Jun 2020 08:18:32 -0700


vinothchandar commented on pull request #1721:
URL: https://github.com/apache/hudi/pull/1721#issuecomment-648885633



   > Regarding sampling, what if some of the partitions are skewed? Will that 
cause more overhead than flush the file out?
   
   IIRC the partitionRecordKeyPairRDD would have even distribution of keys from 
the precombine step which just does a `reduceByKey`. We can always support a 
config to increase the sampling rate, right? All depends on how much difference 
there is in the computed parallelism with samplingRate=0.1 and 1.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

Reply via email to