Hideyuki Okada created BEAM-3958:
------------------------------------

             Summary: beam-sdks-java-io-amazon-web-services may be global 
pollution.
                 Key: BEAM-3958
                 URL: https://issues.apache.org/jira/browse/BEAM-3958
             Project: Beam
          Issue Type: Bug
          Components: io-java-aws, io-java-gcp, runner-dataflow
    Affects Versions: 2.3.0
         Environment: 
maven-compiler-plugin: 3.6.1
  - source: 1.8
  - target: 1.8
maven-shade-plugin: 3.1.0
exec-maven-plugin] 1.5.0
google-cloud-dataflow-java-sdk-all: 2.3.0
google-cloud-bigquery: 0.26.0-beta
grpc-google-common-protos: 1.0.0

beam-sdks-java-io-amazon-web-services: 2.3.0 and 2.4.0
            Reporter: Hideyuki Okada
            Assignee: Ismaël Mejía
             Fix For: 2.3.0


Note:
 I am sorry if it is difficult to read this report because I am not good at 
English.

Thank you for implementing S3FileSystem.

I tried implementing a program which performs FileIO with AWS S3 on Dataflow, 
and, It works.
But other Dataflow Pipeline which moved correctly until adding the SDK to 
dependencies has not working.

Specifically, the next log will flow after program that has not working 
execution starts.
`Info: The AWS S3 Beam extension was included in this build, but the awsRegion 
flag was not specified. If you do not plan to use S3, then ignore this message. 
[Date]`

In practice, jobs that do not end on Dataflow are created. It keeps running 
without spilling out errors or logs.
And, If you pass 'awsRegion' as an argument, this will works successfully. But 
it is a strange workaround.
This means that aws sdk is requesting the connection information to a program 
not accessing S3. Is not it contaminated?

As far as I've investigated, this Log seems to be spitting out in this part
https://github.com/apache/beam/blob/7fa6292a21564744011fe94a7e50f7e074564b71/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L108-L112

It must pass the region as an argument?

I want you to tell me if I'm wrong. And If it is contaminated, I hope this 
problem will be fixed.

The version of sdks that I tried.

google-cloud-dataflow-java-sdk-all: 2.3.0
beam-sdks-java-io-amazon-web-services: 2.3.0 and 2.4.0

Thank you for reading.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to