gunjdesai commented on issue #6610: URL: https://github.com/apache/hudi/issues/6610#issuecomment-1247777136
@xushiyan yes this is a spark structured streaming job. So we are running the job on K8S Spot instances, there are cases where we face driver eviction, hence we can't use multi-writer approach as it can mess with the locks. Yes the job does scale based on backfill traffic going up. Actually the idea was not to stop the real-time pipeline when doing backfill, but i think our setup would not allow us to do that. On further reading, I was thinking about stopping the real-time pipeline, doing a **_bulk_insert_** for the table and then starting the real-time pipeline again in **_upsert_** mode Would you say this could be a good approach ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
