[
https://issues.apache.org/jira/browse/BEAM-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542033#comment-16542033
]
Samuel Waggoner commented on BEAM-4772:
---------------------------------------
We found a workaround. Adding .withHintMatchesManyFiles causes
EmptyMatchTreatment to be respected. e.g.
{code:java}
pipeline.apply("First read",
TextIO.read().from("gs://apache-beam-samples/doesnotexist/*")
.withHintMatchesManyFiles()
.withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW)){code}
> TextIO.read transform does not respect .withEmptyMatchTreatment
> ---------------------------------------------------------------
>
> Key: BEAM-4772
> URL: https://issues.apache.org/jira/browse/BEAM-4772
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Affects Versions: 2.5.0
> Reporter: Samuel Waggoner
> Assignee: Kenneth Knowles
> Priority: Major
>
> I modified the MinimalWordCount example to reproduce. I expect the read
> transform to read 0 lines rather than give an exception, since I used
> EmptyMatchTreatment.ALLOW. I see the same behavior with ALLOW_IF_WILDCARD.
> The EmptyMatchTreatment value seems to be ignored.
> {code:java}
> public class MinimalWordCount {
> public static void main(String[] args) {
> PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> p.apply(TextIO.read()
> .from("gs://apache-beam-samples/doesnotexist/*")
> .withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
> }
> }
> {code}
> {code:java}
> Exception in thread "main"
> org.apache.beam.sdk.Pipeline$PipelineExecutionException:
> java.io.FileNotFoundException: No files matched spec:
> gs://apache-beam-samples/doesnotexist/*
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
> at org.apache.beam.examples.MinimalWordCount.main(MinimalWordCount.java:124)
> Caused by: java.io.FileNotFoundException: No files matched spec:
> gs://apache-beam-samples/doesnotexist/*
> at
> org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
> at
> org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:222)
> at
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
> at
> org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:91)
> at
> org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81){code}
> We see this behavior both when using DirectRunner and DataflowRunner
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)