vinothchandar edited a comment on pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628910318
@garyli1019 thinking about it, even today without the bloom filters, the parquet size include additional stats and metadata contained internally.. So, it's never going to be pure record size, right? >commit1 only wrote 1 record but the parquet file is 20MB This feels like a misconfigured bloom filter.. 20MB bloom filter is just too much.. Like I mentioned, the dynamic bloom filter approach has a capping on the footer size and hopefully something like this does not happen.. Also a future write will overlook at 20MB as small file, only if you explicitly lowered the default (100MB) to less than 20MB right? Overall, I am saying - this configuration seems to be asking for trouble :P cc @nsivabalan who implemented the dynamic bloom filters.. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
