vinothchandar commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628910318


   @garyli1019 thinking about it, even today without the bloom filters, the 
parquet size include additional stats and metadata contained internally.. So, 
it's never going to be pure record size, right? 
   
   >commit1 only wrote 1 record but the parquet file is 20MB
   This feels like a misconfigured bloom filter.. 20MB bloom filter is just too 
much.. Like I mentioned, the dynamic bloom filter approach has a capping on the 
footer size and hopefully something like this does not happen.. Also a future 
write will overlook at 20MB as small file, only if you explicitly lowered the 
default (100MB) to less than 20MB right?  Overall, I am saying - this 
configuration seems to be asking for trouble :P 
   
   cc @nsivabalan who implemented the dynamic bloom filters.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to