asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658837587


   @bvaradar Thanks for quick response Balaji. To understand it correctly let 
me quickly run with an example
   The data that is generated for a dataset will be in some range of 1MB for 
each 500 datasets. I had set the following properties
   "hoodie.parquet.small.file.limit": 2*1024*1024,
    "hoodie.parquet.max.file.size": 2*1024*1024*1024,
   So to understand correctly  when the data write happens the size 1 MB less 
than the 2 MB small file limit the first parquet written will be 1 MB. The 
second write of the data which is another 1 MB should merge to existing 
parquet. For the 3 write the data will be 1 MB but the first partition is 
already reached the 2 MB so second parquet will be created? 
   Where does maxsize will be used in this process?
   
   Also this will happen automatically or I need to specify some other 
properties to take this into effect apart from the two properties I have 
specified. 
   
   Also if want to contribute to the development of the clustering feature what 
will be the process for it?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to