Iqbal Singh created SPARK-24295: ----------------------------------- Summary: Purge Structured streaming FileStreamSinkLog metadata compact file data. Key: SPARK-24295 URL: https://issues.apache.org/jira/browse/SPARK-24295 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Iqbal Singh
FileStreamSinkLog metadata logs are concatenated to a single compact file after defined compact interval. For long running jobs, compact file size can grow up to 10's of GB's, Causing slowness while reading the data from FileStreamSinkLog dir as spark is defaulting to the "__spark__metadata" dir for the read. We need a functionality to purge the compact file size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org