[GitHub] [incubator-hudi] Antauri edited a comment on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

GitBox Wed, 08 Apr 2020 08:44:44 -0700

Antauri edited a comment on issue #1394: [HUDI-656][Performance] Return a dummy 
Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394#issuecomment-611032718
 
 
   Present in 0.5.2-incubating on EMR 6.x which we're using. We're in 
development of a framework that does S3 to S3 ingestion using Hudi and using 
Spark SQL writers (not RDDs). We have year=x/month=y/day=z/bin=q partitioning. 
For 3 days and 575 paths each it takes 3+ minutes between repetitive "listing 
leaf files and directories". In total some 9 minutes for just 3 days.
   
   Any idea when 0.6.0 will be released? And does adding "Hive" as the 
metastore helps in reducing this listing or it doesn't matter?
   
   Thank you kind!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] Antauri edited a comment on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

Reply via email to