bvaradar commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-658902047


   @zuyanton : HoodieParquetInputFormat relies on hadoop-mapreduce 
FileInputFormat listing implementation to perform listing. There is a knob in 
base FileInputFormat to tune listing parallelism.  
   
   "mapreduce.input.fileinputformat.list-status.num-threads"
   
   The above config is set to 1 by default. Can you try increasing it to 
achieve speedup.
   
   @zuyanton : We are also working on RFC-15 
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements
 to holistically eliminate file listing and improve query performance. 
   
   cc @umehrot2  for any other suggestions. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to