[ 
https://issues.apache.org/jira/browse/BEAM-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Waggoner updated BEAM-4772:
----------------------------------
    Description: 
I modified the MinimalWordCount example to reproduce. I expect the read 
transform to read 0 lines rather than give an exception, since I used 
EmptyMatchTreatment.ALLOW. I see the same behavior with ALLOW_IF_WILDCARD. The 
EmptyMatchTreatment value seems to be ignored.
{code:java}
public class MinimalWordCount {

 public static void main(String[] args) {

   PipelineOptions options = PipelineOptionsFactory.create();

   Pipeline p = Pipeline.create(options);

   p.apply(TextIO.read()
     .from("gs://apache-beam-samples/doesnotexist/*")
     .withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))

    .apply(TextIO.write().to("wordcounts"));

   p.run().waitUntilFinish();
 }
}
{code}
{code:java}
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.io.FileNotFoundException: No files matched spec: 
gs://apache-beam-samples/doesnotexist/*
 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
 at org.apache.beam.examples.MinimalWordCount.main(MinimalWordCount.java:124)
Caused by: java.io.FileNotFoundException: No files matched spec: 
gs://apache-beam-samples/doesnotexist/*
 at 
org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
 at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
 at 
org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:222)
 at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
 at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:91)
 at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81){code}
We see this behavior both when using DirectRunner and DataflowRunner 

  was:
I modified the MinimalWordCount example to reproduce. I expect the read 
transform to read 0 lines rather than give an exception, since I used 
EmptyMatchTreatment.ALLOW. I see the same behavior with ALLOW_IF_WILDCARD. The 
EmptyMatchTreatment value seems to be ignored.
{code:java}
public class MinimalWordCount {

 public static void main(String[] args) {

   PipelineOptions options = PipelineOptionsFactory.create();

   Pipeline p = Pipeline.create(options);

   p.apply(TextIO.read()
     .from("gs://apache-beam-samples/doesnotexist/*")
     .withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))

    .apply(TextIO.write().to("wordcounts"));

   p.run().waitUntilFinish();
 }
}
{code}
{code:java}
Exception in thread "main" 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.io.FileNotFoundException: No files matched spec: 
gs://apache-beam-samples/doesnotexit/*
 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
 at org.apache.beam.examples.MinimalWordCount.main(MinimalWordCount.java:124)
Caused by: java.io.FileNotFoundException: No files matched spec: 
gs://apache-beam-samples/doesnotexit/*
 at 
org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
 at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
 at 
org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:222)
 at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
 at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:91)
 at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81){code}

We see this behavior both when using DirectRunner and DataflowRunner 


> TextIO.read transform does not respect .withEmptyMatchTreatment
> ---------------------------------------------------------------
>
>                 Key: BEAM-4772
>                 URL: https://issues.apache.org/jira/browse/BEAM-4772
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 2.4.0, 2.5.0
>            Reporter: Samuel Waggoner
>            Assignee: Kenneth Knowles
>            Priority: Major
>
> I modified the MinimalWordCount example to reproduce. I expect the read 
> transform to read 0 lines rather than give an exception, since I used 
> EmptyMatchTreatment.ALLOW. I see the same behavior with ALLOW_IF_WILDCARD. 
> The EmptyMatchTreatment value seems to be ignored.
> {code:java}
> public class MinimalWordCount {
>  public static void main(String[] args) {
>    PipelineOptions options = PipelineOptionsFactory.create();
>    Pipeline p = Pipeline.create(options);
>    p.apply(TextIO.read()
>      .from("gs://apache-beam-samples/doesnotexist/*")
>      .withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
>     .apply(TextIO.write().to("wordcounts"));
>    p.run().waitUntilFinish();
>  }
> }
> {code}
> {code:java}
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.io.FileNotFoundException: No files matched spec: 
> gs://apache-beam-samples/doesnotexist/*
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>  at org.apache.beam.examples.MinimalWordCount.main(MinimalWordCount.java:124)
> Caused by: java.io.FileNotFoundException: No files matched spec: 
> gs://apache-beam-samples/doesnotexist/*
>  at 
> org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
>  at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
>  at 
> org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:222)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
>  at 
> org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:91)
>  at 
> org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81){code}
> We see this behavior both when using DirectRunner and DataflowRunner 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to