[
https://issues.apache.org/jira/browse/BEAM-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Lucas updated BEAM-12857:
---------------------------------
Description:
I have a simple batch job, running on Dataflow, that reads from a GCS bucket,
filters the data, and windows and writes the matching data back to a different
path in the same bucket.
The job seems to succeed in reading and filtering the data, as well as writing
temporary files to GCS, but appears to fail when trying to rename the temporary
files to their final destination.
The IndexOutOfBoundsException is thrown from
[FileSystems.java:429|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L429]
(in 2.32.0), when the code calls {{.get(0)}} on the list returned by a call to
{{MatchResult#metadata()}}.
The javadoc for
[{{MatchResult#metadata()}}|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java#L75-L80]
says,
{code:java}
/**
* {@link Metadata} of matched files. Note that if {@link #status()} is
{@link Status#NOT_FOUND},
* this may either throw a {@link java.io.FileNotFoundException} or return an
empty list,
* depending on the {@link EmptyMatchTreatment} used in the {@link
FileSystems#match} call.
*/
{code}
So possibly GCS is not returning any metadata for the (missing) destination
object? That seems unlikely, as I would expect many others would have already
run into this, but I don't see how this could be caused by my user code.
I have tested this on 2.31.0 and 2.32.0 getting the same error, but it's worth
noting that the logic in FileSystems.java changed a decent amount recently in
[#15301|https://github.com/apache/beam/pull/15301], maybe having an effect on
this, but I haven't been able to test it since I'm working in a closed
environment and can only easily use released versions of Beam. Once a version
containing this change is released, I will upgrade and try again.
was:
I have a simple batch job, running on Dataflow, that reads from a GCS bucket,
filters the data, and windows and writes the matching data back to a different
path in the same bucket.
The job seems to succeed in reading and filtering the data, as well as writing
temporary files to GCS, but appears to fail when trying to rename the temporary
files to their final destination.
The IndexOutOfBoundsException is thrown from
[FileSystems.java:429|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L429]
(in 2.32.0), when the code calls {{.get(0)}} on the list returned by a call to
{{MatchResult#metadata()}}.
The javadoc for
[{{MatchResult#metadata()}}|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java#L75-L80]
says,
{code:java}
/**
* {@link Metadata} of matched files. Note that if {@link #status()} is
{@link Status#NOT_FOUND},
* this may either throw a {@link java.io.FileNotFoundException} or return an
empty list,
* depending on the {@link EmptyMatchTreatment} used in the {@link
FileSystems#match} call.
*/
{code}
So possibly GCS is not returning any metadata for the (missing) destination
object? That seems unlikely, as I would expect many others would have already
run into this, but I don't see how this could be caused by my user code.
I have tested this on 2.31.0 and 2.32.0 getting the same error, but it's worth
noting that the logic in FileSystems.java changed a decent amount recently in
[#15301|https://github.com/apache/beam/pull/15301], and maybe have an effect on
this, but I haven't been able to test it since I'm working in a closed
environment and can only easily use released versions of Beam. Once a version
containing this change is released, I will upgrade and try again.
> Unable to write to GCS due to IndexOutOfBoundsException in FileSystems
> ----------------------------------------------------------------------
>
> Key: BEAM-12857
> URL: https://issues.apache.org/jira/browse/BEAM-12857
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.31.0, 2.32.0
> Environment: Beam 2.31.0/2.32.0, Java 11, GCP Dataflow
> Reporter: Patrick Lucas
> Priority: P2
>
> I have a simple batch job, running on Dataflow, that reads from a GCS bucket,
> filters the data, and windows and writes the matching data back to a
> different path in the same bucket.
> The job seems to succeed in reading and filtering the data, as well as
> writing temporary files to GCS, but appears to fail when trying to rename the
> temporary files to their final destination.
> The IndexOutOfBoundsException is thrown from
> [FileSystems.java:429|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L429]
> (in 2.32.0), when the code calls {{.get(0)}} on the list returned by a call
> to {{MatchResult#metadata()}}.
> The javadoc for
> [{{MatchResult#metadata()}}|https://github.com/apache/beam/blob/v2.32.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/MatchResult.java#L75-L80]
> says,
> {code:java}
> /**
> * {@link Metadata} of matched files. Note that if {@link #status()} is
> {@link Status#NOT_FOUND},
> * this may either throw a {@link java.io.FileNotFoundException} or return
> an empty list,
> * depending on the {@link EmptyMatchTreatment} used in the {@link
> FileSystems#match} call.
> */
> {code}
> So possibly GCS is not returning any metadata for the (missing) destination
> object? That seems unlikely, as I would expect many others would have already
> run into this, but I don't see how this could be caused by my user code.
> I have tested this on 2.31.0 and 2.32.0 getting the same error, but it's
> worth noting that the logic in FileSystems.java changed a decent amount
> recently in [#15301|https://github.com/apache/beam/pull/15301], maybe having
> an effect on this, but I haven't been able to test it since I'm working in a
> closed environment and can only easily use released versions of Beam. Once a
> version containing this change is released, I will upgrade and try again.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)