Thank you for the reply. For our use case, it's okay to not have exactly-once semantics. Given this use case of not needing exactly-once a) Is there any negative implications if one were to use a custom state store provider which asynchronously committed under the hood b) Is there any other option to achieve this without using a custom state store provider ?
Rohit On Mon, Mar 22, 2021 at 4:09 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > I see some points making async checkpoint be tricky to add in micro-batch; > one example is "end to end exactly-once", as the commit phase in sink for > the batch N can be run "after" the batch N + 1 has been started and write > for batch N + 1 can happen before committing batch N. state store > checkpoint is tied to task lifecycle instead of checkpoint phase, which is > also tricky to make it be async. > > There may be still some spots to optimize on checkpointing though, one > example is SPARK-34383 [1]. I've figured out it helps to reduce latency on > checkpointing with object stores by 300+ ms per batch. > > Btw, even though S3 is now strongly consistent, it doesn't mean it's HDFS > compatible as default implementation of SS checkpoint requires. Atomic > rename is not supported, as well as rename isn't just a change on metadata > (read from S3 and write to S3 again). Performance would be sub-optimal, and > Spark no longer be able to prevent concurrent streaming queries trying to > update to the same checkpoint which might possibly mess up the checkpoint. > You need to make sure there's only one streaming query running against a > specific checkpoint. > > 1. https://issues.apache.org/jira/browse/SPARK-34383 > > On Tue, Mar 23, 2021 at 1:55 AM Rohit Agrawal <ro...@tecton.ai> wrote: > >> Hi, >> >> I have been experimenting with the Continuous mode and the Micro batch >> mode in Spark Structured Streaming. When enabling checkpoint to S3 instead >> of the local File System we see that Continuous mode has no change in >> latency (expected due to async checkpointing) however the Micro-batch mode >> experiences degradation likely due to sync checkpointing. >> >> Is there any way to get async checkpointing in the micro-batching mode as >> well to improve latency. Could that be done with custom checkpointing logic >> ? Any pointers / experiences in that direction would be helpful. >> >