Sunil Kumar created SPARK-18157:
-----------------------------------
Summary: CLONE - Support purging aged file entry for
FileStreamSource metadata log
Key: SPARK-18157
URL: https://issues.apache.org/jira/browse/SPARK-18157
Project: Spark
Issue Type: Sub-task
Components: SQL, Streaming
Reporter: Sunil Kumar
Priority: Minor
Currently with SPARK-15698, FileStreamSource metadata log will be compacted
periodically (10 batches by default), this means compacted batch file will
contain whole file entries been processed. With the time passed, the compacted
batch file will be accumulated to a relative large file.
With SPARK-17165, now {{FileStreamSource}} doesn't track the aged file entry,
but in the log we still keep the full records, this is not necessary and quite
time-consuming during recovery. So here propose to also add file entry purging
ability to {{FileStreamSource}} metadata log.
This is pending on SPARK-15698.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]