Xavier HAUSHERR created BEAM-10261:
--------------------------------------

             Summary: [FileIO] Unexpected exception thrown when retrieving a 
GCS file with a space inside path
                 Key: BEAM-10261
                 URL: https://issues.apache.org/jira/browse/BEAM-10261
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
    Affects Versions: 2.21.0, 2.20.0
         Environment: Google Cloud Dataflow
            Reporter: Xavier HAUSHERR


Hi,

I am using a PTransform class to retrieve Google Cloud Storage files with 
FileIO that were working very well before version 2.20.0. 

I have upgraded my Beam library last week, to 2.20.0 & 2.21.0 and now I have an 
unexpected Exception when I retrieve some files with space inside the path:
{code:java}
Error message from worker: java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: java.io.FileNotFoundException: Item 
not found: 
'gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<[email protected]
 /body.txt'. If you enabled STRICT generation consistency, it is possible that 
the live version is still available but the intended generation is deleted. 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
{code}
 

Please note that the gcloud following gcloud command works:
{code:bash}
gsutil ls 
"gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<[email protected]
 /body.txt"{code}
 

Here is my code:
{code:java}
public PCollection<KV<String, byte[]>> expand(PBegin begin) {
    PCollection<KV<String, byte[]>> files = begin
.apply(FileIO.match().filepattern("gs://[MY_BUCKET]/**/body.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
        .apply(FileIO.readMatches())
        .apply("Extract key",
            ParDo.of(
                new DoFn<ReadableFile, KV<String, byte[]>>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws 
IOException {
                        ReadableFile f = c.element();
                        c.output(KV.of(f.getMetadata().resourceId().toString(), 
f.readFullyAsBytes()));
                    }
                }
            )
        );

    return files;
}
{code}
 

Maybe I just need to find a way to escape the file path but I don't know how.

 

I hope you can help me. 

 

Xavier

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to