garyli1019 commented on pull request #1602:
URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-628817124


   @vinothchandar I definitely agree a statistical table would be a better 
approach, but it will take a while I believe. I am happy to contribute to this 
topic as well. 
   Any other recommendation for a short term fix for this issue? I believe this 
bug could happen again. When something upstream goes wrong, like Kafka or HDFS 
goes down in the production for a short period of time, Hudi will have a chance 
to make an abnormal small commit.
   Regarding the bloom filter size, I think they all use the bloom filter 
entries and FP rate to calculate the size, for simple, dynamic, local, and 
global. Once we switch to the parquet native approach, we can change the way of 
the estimation. I think the calculation could be accurate. HBASE index is not 
covered in this PR though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to