[ https://issues.apache.org/jira/browse/SPARK-31599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Kizhakkel Jose updated SPARK-31599: ----------------------------------------- Description: I have a S3 bucket which has data streamed (Parquet format) to it by Spark Structured Streaming Framework from Kafka. Periodically I try to run compaction on this bucket (a separate Spark Job), and on successful compaction delete the non compacted (parquet) files. After which I am getting following error on Spark jobs which read from that bucket: *Caused by: java.io.FileNotFoundException: No such file or directory: s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet* How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need to delete the un-compacted files after successful compaction to save space. was: I have a S3 bucket which has data streamed (Parquet format) to it by Spark Structured Streaming Framework from Kafka. Periodically I try to run compaction on this bucket (a separate Spark Job), and on successful compaction delete the non compacted (parquet) files. After which I am getting following error on Spark jobs which read from that bucket: *Caused by: java.io.FileNotFoundException: No such file or directory: s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet* How do we run *_c__ompaction on Structured Streaming S3 bucket_s*. Also I need to delete the un-compacted files after successful compaction to save space. > Reading from S3 (Structured Streaming Bucket) Fails after Compaction > -------------------------------------------------------------------- > > Key: SPARK-31599 > URL: https://issues.apache.org/jira/browse/SPARK-31599 > Project: Spark > Issue Type: New Feature > Components: SQL, Structured Streaming > Affects Versions: 2.4.5 > Reporter: Felix Kizhakkel Jose > Priority: Major > > I have a S3 bucket which has data streamed (Parquet format) to it by Spark > Structured Streaming Framework from Kafka. Periodically I try to run > compaction on this bucket (a separate Spark Job), and on successful > compaction delete the non compacted (parquet) files. After which I am getting > following error on Spark jobs which read from that bucket: > *Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet* > How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need > to delete the un-compacted files after successful compaction to save space. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org