WaterKnight1998 commented on issue #1777: URL: https://github.com/apache/hudi/issues/1777#issuecomment-652597165
> Ah okay, I think these are default values for the configs. You would need configure each of them based on table schema. Here is the config session that has explanation of these configs - https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY > https://hudi.apache.org/docs/configurations.html#RECORDKEY_FIELD_OPT_KEY > https://hudi.apache.org/docs/configurations.html#PARTITIONPATH_FIELD_OPT_KEY > > I can help with these configs. You could chose a combination of `date,store,item` for record key to ensure uniqueness. > For precombine key, you need to chose a field that would help determine which is the latest record among two records with same record key. > For partition path, you would need to chose how to group you data. Here it could just be on date or a combination of date and store and more. This determines how your table data is partitioned. If you are interested in sales on a daily basis may be just date based partition would be good. > > Please let me know if you have more questions. I make it work as follows: ``` tableName = "forecast_evals" basePath = "gs://hudi-datalake/" + tableName hudi_options = { 'hoodie.table.name': tableName, 'hoodie.datasource.write.recordkey.field': 'key', 'hoodie.datasource.write.table.name': tableName, 'hoodie.datasource.write.operation': 'insert', 'hoodie.datasource.write.precombine.field': 'training_date' } results = results.selectExpr( "CONCAT('Store=', store, ' Item=', item) as key", "store", "item", "mae", "mse", "rmse", "training_date") results.write.format("hudi"). \ options(**hudi_options). \ mode("overwrite"). \ save(basePath) ``` However, it runs very slow! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
