Friendly reminder. I'll submit the proposed change if there is no objection
observed this week.

On Wed, Dec 8, 2021 at 4:16 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi dev,
>
> I would like to hear voices about deprecating Trigger.Once, and replacing
> it with Trigger.AvailableNow [1] in Structured Streaming.
>
> Rationalization:
>
> The expected behavior of Trigger.Once is like reading all available data
> after the last trigger and processing them. This holds true when the last
> run was gracefully terminated, but there are cases streaming queries to not
> be terminated gracefully. There is a possibility the last run may write the
> offset (WAL) for the new batch before termination, then a new run of
> Trigger.Once only processes the data which was built in the latest
> unfinished batch, and doesn't process new data.
>
> The behavior is not deterministic from the users' point of view, as end
> users wouldn't know whether the last run wrote the offset or not, unless
> they look into the query's checkpoint by themselves.
>
> While Trigger.AvailableNow came to solve the scalability issue on
> Trigger.Once, it also ensures that it tries to process all available data
> at the point of time it is triggered, which consistently works as expected
> behavior of Trigger.Once.
>
> Proposed Plan:
>
> - Deprecate Trigger.Once in Apache Spark 3.3
> - Leave guidance to migrate to Trigger.AvailableNow in migration guide
> - Replace all usages of Trigger.Once with Trigger.AvailableNow, except the
> test cases of Trigger.Once itself
>
> Please review the proposal and share your voice on this.
>
> Thanks!
> Jungtaek Lim
>
> 1. https://issues.apache.org/jira/browse/SPARK-36533
>

Reply via email to