[ 
https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593706#comment-15593706
 ] 

Tyson Condie commented on SPARK-17829:
--------------------------------------

Had a conversation with Michael about how to offset serialization. When 
considering deserialization, the following three options seem possible.
1. Ask the source to deserialize the string into an offset (object).
2. Follow a formatting convention e.g., first line identifies an offset 
implementation class that accepts a string constructor argument; the string 
that is passed to the constructor comes from the second line.
3. Get rid of the Offset trait entirely and only deal with strings. This seems 
reasonable since we do not need to compare two offsets; we only care about the 
source's understanding of the offset, which it can interpret from whatever it 
embeds in the string e.g., like option 2. 



> Stable format for offset log
> ----------------------------
>
>                 Key: SPARK-17829
>                 URL: https://issues.apache.org/jira/browse/SPARK-17829
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Tyson Condie
>
> Currently we use java serialization for the WAL that stores the offsets 
> contained in each batch.  This has two main issues:
>  - It can break across spark releases (though this is not the only thing 
> preventing us from upgrading a running query)
>  - It is unnecessarily opaque to the user.
> I'd propose we require offsets to provide a user readable serialization and 
> use that instead.  JSON is probably a good option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to