kirkuz commented on issue #2323:
URL: https://github.com/apache/hudi/issues/2323#issuecomment-745166974


   @n3nash 
   
   1. I can't really understand what is the difference between GLOBAL_BLOOM and 
GLOBAL_SIMPLE. Will the latter solve the problem with updating the partition 
for me (I mean removing the record from previous partition and adding it to the 
new one)?
   Where should I use GLOBAL_SIMPLE, in which use-cases?
   2. Do you have any recommendation about performance tunning like number of 
instances, cores, memory etc.?
   3. Do you use GLOBAL_BLOOM in your use-cases in Uber? I've learnt on slack 
channel that you use HBASE index. Does it mean that HBASE index is doing the 
same as GLOBAL_BLOOM. What I'm wondering is that if my use case is so rare (to 
delete from old partition and insert into new partition) that nobody has raised 
that problem so far?
   4. Do you think that switching to Kafka and DeltaStreamer (with continuous 
integration) will solve my issue that I will have less rows to upsert each 
time? Or it will mean that each upsert with DeltaStreamer it will again have to 
list all partitions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to