deepakpanda93 commented on issue #17643: URL: https://github.com/apache/hudi/issues/17643#issuecomment-3681639620
@pravin1406 It looks like RLI is being bootstrapped during the first write, which causes Hudi to fall back to a GLOBAL_SIMPLE index for the initial index lookup. This is expensive because it requires a full shuffle of the dataset. To avoid this, please bootstrap the RLI explicitly beforehand using the `CREATE INDEX` command, so that subsequent writes can leverage the index directly. Additionally, please remove the following Spark configuration: `spark.memory.fraction=0.2` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
