[ 
https://issues.apache.org/jira/browse/SPARK-17604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626803#comment-15626803
 ] 

Michael Armbrust commented on SPARK-17604:
------------------------------------------

I think it would be good for us to support data retention policies, but since 
this is source specific and changes the results of the data stored I'd like to 
pull this out from the general long running ticket and into its own feature.

> Support purging aged file entry for FileStreamSource metadata log
> -----------------------------------------------------------------
>
>                 Key: SPARK-17604
>                 URL: https://issues.apache.org/jira/browse/SPARK-17604
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Saisai Shao
>            Priority: Minor
>
> Currently with SPARK-15698, FileStreamSource metadata log will be compacted 
> periodically (10 batches by default), this means compacted batch file will 
> contain whole file entries been processed. With the time passed, the 
> compacted batch file will be accumulated to a relative large file. 
> With SPARK-17165, now {{FileStreamSource}} doesn't track the aged file entry, 
> but in the log we still keep the full records,  this is not necessary and 
> quite time-consuming during recovery. So here propose to also add file entry 
> purging ability to {{FileStreamSource}} metadata log.
> This is pending on SPARK-15698.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to