[ 
https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606896#comment-15606896
 ] 

Tathagata Das commented on SPARK-17829:
---------------------------------------

Based on [~tcondie] PR above, I think its better we also change the main common 
log class HDFSMetadataLog to use Json serialization rather than Java 
serialization. 

But this also means that we have to modify FileStreamSourceLog (subclass of 
HDFSMetadataLog[FileEntry]) to also use json serialization. Which is good to 
fix as well, as the file stream source log should also have a stable on-disk 
format and not depend on java serialization.

> Stable format for offset log
> ----------------------------
>
>                 Key: SPARK-17829
>                 URL: https://issues.apache.org/jira/browse/SPARK-17829
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Tyson Condie
>
> Currently we use java serialization for the WAL that stores the offsets 
> contained in each batch.  This has two main issues:
>  - It can break across spark releases (though this is not the only thing 
> preventing us from upgrading a running query)
>  - It is unnecessarily opaque to the user.
> I'd propose we require offsets to provide a user readable serialization and 
> use that instead.  JSON is probably a good option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to