YuweiXiao commented on PR #5771:
URL: https://github.com/apache/hudi/pull/5771#issuecomment-1151844365

   > good point @YuweiXiao . But I am not sure if we can do that. after few 
commits, if user wishes to add upsert records, we may need the bloom filter 
index. May be we can take it as another feature support. Make entire table 
immutable sort of. In that case, we can avoid generating/populating bloom 
filter as wel.
   
   Yeah, we have to modify the parquet writer to let bloom filter optional. I 
could create a JIRA for optional bloom filter if necessary. In my local 
testing, bloom filter generation could take up ~20% costs of the writing part.
   
   Actually in this PR, there will be no record key (i.e., empty string). If 
user want to upsert records, they need to re-load the full table or use SIMPLE 
index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to