[GitHub] [hudi] LeoHsu0802 commented on issue #933: Support for multiple level partitioning in Hudi

GitBox Tue, 13 Oct 2020 03:54:39 -0700


LeoHsu0802 commented on issue #933:
URL: https://github.com/apache/hudi/issues/933#issuecomment-707660374



   > I found the way to do this, For anyone's reference this can be achieved by
   > 
   > 1. Use org.apache.hudi.ComplexKeyGenerator as key generator class instead 
of SimpleKeyGenerator.
   > 2. Provide the fields that you want to partition based on as comma 
separated string as PARITION_FIELD_OPT_KEY
   > 
   > Reference :
   > 
https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java#L42
   
   Hi @afeldman1 , I have a question about point 2, I try to partition by 
year/month/day in pyspark but didn't work and below is what I setting.
   
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'id',
     'hoodie.datasource.write.partitionpath.field': {"year","month","day"},
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'insert',
     'hoodie.datasource.write.precombine.field': 'country',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2
   }
   
   May I ask why?
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] LeoHsu0802 commented on issue #933: Support for multiple level partitioning in Hudi

Reply via email to