[
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622623#comment-16622623
]
Jungtaek Lim edited comment on SPARK-10816 at 9/20/18 7:58 PM:
---------------------------------------------------------------
I'm aware of the needs for more advanced cases (like dynamic gap session
window), but for simple case of session window we still have a chance to make
it pretty simple. For DSL we may want to provide advanced (complicated) cases,
but for SQL why not support basic case which can be expressed as SQL statement?
map/flatMapGroupsWithState is something reserved for experienced users: end
users have to understand how typed API works, and the limitation of defining
watermark in metadata of column (when row is serialized to object the
information is lost. there's relevant issue in Spark JIRA but we just identify
it as limitation and end users are dealing with it), how to craft state
function correctly.
Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle
multiple sessions in same key which is even not enough to handle fixed gap of
session window. Event time and watermark would require us to deal with
arbitrary changes of sessions, like multiple sessions which are not yet target
of eviction, as well as multiple sessions being merged into one due to late
event. Current mechanism of map/flapMapGroupsWithState don't handle this, and
at least require end users to deal with it at their own hands.
was (Author: kabhwan):
I'm aware of the needs for more advanced cases (like dynamic gap session
window), but for simple case of session window we still have a chance to make
it pretty simple. For DSL we may want to provide advanced (complicated) cases,
but for SQL why not support basic case which can be expressed as SQL statement?
map/flatMapGroupsWithState is something reserved for experts: end users have to
understand how typed API works, and the limitation of defining watermark in
metadata of column (when row is serialized to object the information is lost.
there's relevant issue in Spark JIRA but we just identify it as limitation and
end users are dealing with it), how to craft state function correctly.
Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle
multiple sessions in same key which is even not enough to handle fixed gap of
session window. Event time and watermark would require us to deal with
arbitrary changes of sessions, like multiple sessions which are not yet target
of eviction, as well as multiple sessions being merged into one due to late
event. Current mechanism of map/flapMapGroupsWithState don't handle this, and
at least require end users to deal with it at their own hands.
> EventTime based sessionization
> ------------------------------
>
> Key: SPARK-10816
> URL: https://issues.apache.org/jira/browse/SPARK-10816
> Project: Spark
> Issue Type: New Feature
> Components: Structured Streaming
> Reporter: Reynold Xin
> Priority: Major
> Attachments: SPARK-10816 Support session window natively.pdf
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]