[ 
https://issues.apache.org/jira/browse/SPARK-31599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Kizhakkel Jose updated SPARK-31599:
-----------------------------------------
    Description: 
I have a S3 bucket which has data streamed (Parquet format) to it by Spark 
Structured Streaming Framework from Kafka. Periodically I try to run compaction 
on this bucket (a separate Spark Job), and on successful compaction delete the 
non compacted (parquet) files. After which I am getting following error on 
Spark jobs which read from that bucket:
 *Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet*

How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need 
to delete the un-compacted files after successful compaction to save space.

  was:
I have a S3 bucket which has data streamed (Parquet format) to it by Spark 
Structured Streaming Framework from Kafka. Periodically I try to run compaction 
on this bucket (a separate Spark Job), and on successful compaction delete the 
non compacted (parquet) files. After which I am getting following error on 
Spark jobs which read from that bucket:
*Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet*

How do we run *_c__ompaction on Structured Streaming S3 bucket_s*. Also I need 
to delete the un-compacted files after successful compaction to save space.


> Reading from S3 (Structured Streaming Bucket) Fails after Compaction
> --------------------------------------------------------------------
>
>                 Key: SPARK-31599
>                 URL: https://issues.apache.org/jira/browse/SPARK-31599
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL, Structured Streaming
>    Affects Versions: 2.4.5
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> I have a S3 bucket which has data streamed (Parquet format) to it by Spark 
> Structured Streaming Framework from Kafka. Periodically I try to run 
> compaction on this bucket (a separate Spark Job), and on successful 
> compaction delete the non compacted (parquet) files. After which I am getting 
> following error on Spark jobs which read from that bucket:
>  *Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet*
> How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need 
> to delete the un-compacted files after successful compaction to save space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to