dmenin commented on issue #3975:
URL: https://github.com/apache/hudi/issues/3975#issuecomment-967301078


   Hi xushiyan
   Thanks for your reply. 
   The number using GLOBAL_SIMPLE where:
   Around 6 minutes to insert the incoming dataset when I had only one month of 
data (30 partitions). When I backfilled 2021 (adding more 10 months of data - 
300 partitions) the same load job jumped to 26 minutes. 
   
   Regarding your "replicating the logic" comment, yes you are absolutely right 
- that is the behaviour I need, but global indices don't perform.  If there was 
an option saying: "use global indices only on partitions A, B and C, that would 
be perfect. 
   
   Regarding HBASE indexing, I am aware of that option but for reasons I can't 
discuss, I can't pursue it.
   
   Follow up questions: why do you think this is an index lookup problem?  And 
why does it happen inky on the delete and not on the upset operation? 
   
   Thanks 
   Diego
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to