Xavier HAUSHERR created BEAM-10261:
--------------------------------------
Summary: [FileIO] Unexpected exception thrown when retrieving a
GCS file with a space inside path
Key: BEAM-10261
URL: https://issues.apache.org/jira/browse/BEAM-10261
Project: Beam
Issue Type: Bug
Components: io-java-gcp
Affects Versions: 2.21.0, 2.20.0
Environment: Google Cloud Dataflow
Reporter: Xavier HAUSHERR
Hi,
I am using a PTransform class to retrieve Google Cloud Storage files with
FileIO that were working very well before version 2.20.0.
I have upgraded my Beam library last week, to 2.20.0 & 2.21.0 and now I have an
unexpected Exception when I retrieve some files with space inside the path:
{code:java}
Error message from worker: java.lang.RuntimeException:
org.apache.beam.sdk.util.UserCodeException: java.io.FileNotFoundException: Item
not found:
'gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<[email protected]
/body.txt'. If you enabled STRICT generation consistency, it is possible that
the live version is still available but the intended generation is deleted.
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
{code}
Please note that the gcloud following gcloud command works:
{code:bash}
gsutil ls
"gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<[email protected]
/body.txt"{code}
Here is my code:
{code:java}
public PCollection<KV<String, byte[]>> expand(PBegin begin) {
PCollection<KV<String, byte[]>> files = begin
.apply(FileIO.match().filepattern("gs://[MY_BUCKET]/**/body.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
.apply(FileIO.readMatches())
.apply("Extract key",
ParDo.of(
new DoFn<ReadableFile, KV<String, byte[]>>() {
@ProcessElement
public void processElement(ProcessContext c) throws
IOException {
ReadableFile f = c.element();
c.output(KV.of(f.getMetadata().resourceId().toString(),
f.readFullyAsBytes()));
}
}
)
);
return files;
}
{code}
Maybe I just need to find a way to escape the file path but I don't know how.
I hope you can help me.
Xavier
--
This message was sent by Atlassian Jira
(v8.3.4#803005)