[ 
https://issues.apache.org/jira/browse/SPARK-55795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55795:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add automatic V1 to V2 offset log upgrade for streaming queries with named 
> sources
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-55795
>                 URL: https://issues.apache.org/jira/browse/SPARK-55795
>             Project: Spark
>          Issue Type: Task
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Eric Marnadi
>            Priority: Major
>              Labels: pull-request-available
>
> Introduce an automatic offset log upgrade mechanism that allows streaming 
> queries to migrate from V1 (positional) offset tracking to V2 (named) offset 
> tracking when users add {{.name()}} to their streaming sources.
> Currently, when users want to migrate from V1 (index-based) to V2 
> (name-based) offset tracking, they must:
>  # Delete their checkpoint directory (losing all state)
>  # Start fresh
> This is problematic because:
>  * {*}State loss{*}: All stateful operators (aggregations, joins, 
> deduplication) lose their state
>  * {*}Data reprocessing{*}: Query must reprocess all historical data from the 
> beginning
>  * {*}Downtime{*}: Requires stopping the query and careful coordination
> With this change, users can safely migrate existing V1 offset logs to V2 
> format by:
>  # Adding {{.name()}} to all streaming sources
>  # Setting {{spark.sql.streaming.offsetLog.formatVersion=2}}
>  # Setting {{spark.sql.streaming.offsetLog.v1ToV2.autoUpgrade.enabled=true}}
>  # Restarting the query
> The upgrade preserves all state and offset positions, enabling seamless 
> transition to the more flexible V2 format that supports source evolution 
> (adding/removing sources by name).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to