LeoHsu0802 commented on issue #933: URL: https://github.com/apache/hudi/issues/933#issuecomment-707660374
> I found the way to do this, For anyone's reference this can be achieved by > > 1. Use org.apache.hudi.ComplexKeyGenerator as key generator class instead of SimpleKeyGenerator. > 2. Provide the fields that you want to partition based on as comma separated string as PARITION_FIELD_OPT_KEY > > Reference : > https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java#L42 Hi @afeldman1 , I have a question about point 2, I try to partition by year/month/day in pyspark but didn't work and below is what I setting. hudi_options = { 'hoodie.table.name': tableName, 'hoodie.datasource.write.recordkey.field': 'id', 'hoodie.datasource.write.partitionpath.field': {"year","month","day"}, 'hoodie.datasource.write.table.name': tableName, 'hoodie.datasource.write.operation': 'insert', 'hoodie.datasource.write.precombine.field': 'country', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2 } May I ask why? Thanks ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
