kirkuz commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-745166974
@n3nash 1. I can't really understand what is the difference between GLOBAL_BLOOM and GLOBAL_SIMPLE. Will the latter solve the problem with updating the partition for me (I mean removing the record from previous partition and adding it to the new one)? Where should I use GLOBAL_SIMPLE, in which use-cases? 2. Do you have any recommendation about performance tunning like number of instances, cores, memory etc.? 3. Do you use GLOBAL_BLOOM in your use-cases in Uber? I've learnt on slack channel that you use HBASE index. Does it mean that HBASE index is doing the same as GLOBAL_BLOOM. What I'm wondering is that if my use case is so rare (to delete from old partition and insert into new partition) that nobody has raised that problem so far? 4. Do you think that switching to Kafka and DeltaStreamer (with continuous integration) will solve my issue that I will have less rows to upsert each time? Or it will mean that each upsert with DeltaStreamer it will again have to list all partitions? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
