[jira] [Commented] (SPARK-24295) Purge Structured streaming FileStreamSinkLog metadata compact file data.

Parsa Shabani (Jira) Thu, 10 Jun 2021 15:30:34 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361245#comment-17361245
 ]


Parsa Shabani commented on SPARK-24295:
---------------------------------------

Hello all,

Any updates on this? [~kabhwan] are you still planning to merge your PR?

We are having severe issues with the growing size of the .compact files. 

The functionality for the purging and truncating records for older syncs is 
badly needed in my opinion.

 

> Purge Structured streaming FileStreamSinkLog metadata compact file data.
> ------------------------------------------------------------------------
>
>                 Key: SPARK-24295
>                 URL: https://issues.apache.org/jira/browse/SPARK-24295
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Iqbal Singh
>            Priority: Major
>         Attachments: spark_metadatalog_compaction_perfbug_repro.tar.gz
>
>
> FileStreamSinkLog metadata logs are concatenated to a single compact file 
> after defined compact interval.
> For long running jobs, compact file size can grow up to 10's of GB's, Causing 
> slowness  while reading the data from FileStreamSinkLog dir as spark is 
> defaulting to the "__spark__metadata" dir for the read.
> We need a functionality to purge the compact file size.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24295) Purge Structured streaming FileStreamSinkLog metadata compact file data.

Reply via email to