JulianJaffePinterest commented on issue #9780: URL: https://github.com/apache/druid/issues/9780#issuecomment-876317064
@520lailai The calling Spark applications set the version of the segments they write themselves. If a user runs multiple jobs that generate segments for the same data source and time chunk, the application the user assigns the highest version to will produce the ultimately available segments, once all applications have finished. If you have a specific use case in mind where you will be running concurrent Spark jobs that will target the same data sources and intervals, I'd be happy to give you more tailored suggestions. As for Druid task locks, a Spark application calling this writer is not a Druid task. The application is not triggered by a Druid cluster, and the Druid cluster is unaware of the application. If we instead view Druid task locks as segment locks, I could imagine using the internal API to acquire a lock on write, but it would only be useful in limited circumstances. I don't see value in delaying a Spark job from writing (the job will write segments with the specified version regardless; timeshifting the write doesn't change anything). I can see where delaying a real-time ingestion task may be useful. If there's community demand for integrating with Druid locks it could be done. Finally, to your point about security concerns, the writer must provide its own metadata server credentials. The metadata client will only ever attempt to read and insert data into an existing table, and so ideally the associated user should have only those permissions. If you're planning to provide credentials independently of users (for example, via environment variables or a credential store running on the Spark cluster nodes themselves), you should not allow anyone to submit Spark applications you would not allow to send post requests to the overlord. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
