deepakpanda93 commented on issue #17643:
URL: https://github.com/apache/hudi/issues/17643#issuecomment-3681639620

   @pravin1406 It looks like RLI is being bootstrapped during the first write, 
which causes Hudi to fall back to a GLOBAL_SIMPLE index for the initial index 
lookup. This is expensive because it requires a full shuffle of the dataset.
   
   To avoid this, please bootstrap the RLI explicitly beforehand using the 
`CREATE INDEX` command, so that subsequent writes can leverage the index 
directly.
   
   Additionally, please remove the following Spark configuration:
   
   `spark.memory.fraction=0.2`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to