YuweiXiao commented on PR #5771: URL: https://github.com/apache/hudi/pull/5771#issuecomment-1151844365
> good point @YuweiXiao . But I am not sure if we can do that. after few commits, if user wishes to add upsert records, we may need the bloom filter index. May be we can take it as another feature support. Make entire table immutable sort of. In that case, we can avoid generating/populating bloom filter as wel. Yeah, we have to modify the parquet writer to let bloom filter optional. I could create a JIRA for optional bloom filter if necessary. In my local testing, bloom filter generation could take up ~20% costs of the writing part. Actually in this PR, there will be no record key (i.e., empty string). If user want to upsert records, they need to re-load the full table or use SIMPLE index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
