[GitHub] [hudi] xushiyan commented on issue #3758: [SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence

GitBox Tue, 19 Oct 2021 11:43:59 -0700


xushiyan commented on issue #3758:
URL: https://github.com/apache/hudi/issues/3758#issuecomment-947007102



   > Thank you for sharing this, i will try these out, how does doing 
`coalesce` in spark to reduce the number of partitions affect the hudi 
partitions based on partition key? I am already repartitioning to 1000 when 
writing to hudi.
   
   @absognety @nsivabalan i think this relates to 
https://hudi.apache.org/docs/configurations#hoodiebulkinsertsortmode
   
   by default the current Hudi config uses global sort that repartition for 
you. You could either skip calling `spark.repartition()` to save as Hudi or set 
the bulkinsert sort mode to `NONE`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xushiyan commented on issue #3758: [SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence

Reply via email to