[GitHub] [carbondata] Zhangshunyu commented on issue #3637: [CARBONDATA-3721][CARBONDATA-3590] Optimize Bucket Table

GitBox Sat, 29 Feb 2020 18:31:42 -0800

Zhangshunyu commented on issue #3637: [CARBONDATA-3721][CARBONDATA-3590] 
Optimize Bucket Table
URL: https://github.com/apache/carbondata/pull/3637#issuecomment-593037009
 
 
   > @Zhangshunyu other way is to let the spark do the bucketing like how the 
partitioner is implemented. In fact, we can add the bucketing directly into the 
partition flow. Not much changes needed in that case.
   
   @ravipesala is guava murmur hash the same as spark using?
   
   > @Zhangshunyu It was a supported feature earlier but it is bad that code 
got removed some time back. Anyway, spark changed the hashing technique on 
creating buckets so we cannot rely on our own hashing anymore.
   > I see a lot of code got copied spark to just get the hashing. it is not 
recommended to do so as in the future if they change it will again break. Even 
they follow industry-standard murmur hash to do the hash. So please use the 
guava library and do the murmur hashing. Please don't copy the code 
unnecessarily from the spark.
   
   spark using guava hash but not all the same like guava's impl, as for the 
changes in future of spark, if we want to keep same hash code as spark, maybe 
we can depend on spark-unsafe jar directly base on spark-version just like 
carbon depend on diff spark version.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] Zhangshunyu commented on issue #3637: [CARBONDATA-3721][CARBONDATA-3590] Optimize Bucket Table

Reply via email to