[
https://issues.apache.org/jira/browse/SPARK-31376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077723#comment-17077723
]
Jungtaek Lim commented on SPARK-31376:
--------------------------------------
I'll reflect the question; why do you think not allowing global sorting makes
sense in streaming, though you think allowing non-global sorting makes sense in
streaming?
Structured Streaming tries to provide the "same result" as batch query (that's
the one of major concepts) if the same input rows are provided, regardless of
the physical plan and execution (like how micro-batches are planned and
executed). This makes all kinds of "sort" impossible in structured streaming,
and that's not a kind of limitation as it's also impossible to do the sort in
other streaming frameworks as well.
The thing is that sort is not possible in streaming semantics, as data is
unbounded.
> Non-global sort support for structured streaming
> ------------------------------------------------
>
> Key: SPARK-31376
> URL: https://issues.apache.org/jira/browse/SPARK-31376
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.1.0
> Reporter: Adam Binford
> Priority: Minor
>
> Currently, all sorting is disallowed with structured streaming queries. Not
> allowing global sorting makes sense, but could non-global sorting (i.e.
> sortWithinPartitions) be allowed? I'm running into this with an external
> source I'm using, but not sure if this would be useful to file sources as
> well. I have to foreachBatch so that I can do a sortWithinPartitions.
> Two main questions:
> * Does a local sort cause issues with any exactly-once guarantees streaming
> queries provides? I can't say I know or understand how these semantics work.
> Or are there other issues I can't think of this would cause?
> * Is the change as simple as changing the unsupported operations check to
> only look for global sorts instead of all sorts?
> I have built a version that simply changes the unsupported check to only
> disallow global sorts and it seems to be working. Anything I'm missing or is
> it this simple?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]