[GitHub] [hudi] bvaradar commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

GitBox Tue, 14 Jul 2020 07:43:20 -0700


bvaradar commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658221173



   @asheeshgarg :  I think what you are looking for is clustering (not 
compaction) of files which is under development (Please see 
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance).
 To your setup, a good strategy would be to have a single hudi writer read one 
or more of these datasets and ingest to hudi. Hudi supports file sizing - 
Please see 
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoItoavoidcreatingtonsofsmallfiles
 for more details. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

Reply via email to