[
https://issues.apache.org/jira/browse/SPARK-31599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095697#comment-17095697
]
Felix Kizhakkel Jose commented on SPARK-31599:
----------------------------------------------
Thank you [~gsomogyi]. But this is not a S3 issue. The issue is I have
compacted files in the bucket and deleted the non compacted files, but didn't
update/modify the "_spark_metadata" folder. And I could see that those Write
Ahead Log Json files contain the deleted file names. And when I use Spark SQL
to read the data, itĀ first reads the Write Ahead logs fromĀ "_spark_metadata"
and then try to read the files listed in it. So I am wondering how can we
update the "_spark_metadata" content (Write Ahead Logs)?
> Reading from S3 (Structured Streaming Bucket) Fails after Compaction
> --------------------------------------------------------------------
>
> Key: SPARK-31599
> URL: https://issues.apache.org/jira/browse/SPARK-31599
> Project: Spark
> Issue Type: Bug
> Components: SQL, Structured Streaming
> Affects Versions: 2.4.5
> Reporter: Felix Kizhakkel Jose
> Priority: Major
>
> I have a S3 bucket which has data streamed (Parquet format) to it by Spark
> Structured Streaming Framework from Kafka. Periodically I try to run
> compaction on this bucket (a separate Spark Job), and on successful
> compaction delete the non compacted (parquet) files. After which I am getting
> following error on Spark jobs which read from that bucket:
> *Caused by: java.io.FileNotFoundException: No such file or directory:
> s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet*
> How do we run *_c_ompaction on Structured Streaming S3 bucket_s*. Also I need
> to delete the un-compacted files after successful compaction to save space.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]