Hi Guys,
I’m new in Apache Beam. In my current project we have follow scenario:
- We run transformations via Apache Beam Pipeline into Amazon AWS (using
Cluster by Spark, Hadoop). We can in future produce big Files.
- The generated a AVRO file that should be stored from AWS Account A to
AWS Account B in "S3://some_store". The process is started in Account A
because here is the required data access layer.
- The first experimentation shown:
- A empty file is created (0 bytes -> MyTransformation.avro) in Account
A
- After process is finished, no file appear in Account B
"S3://sone_store”. File is missing.
The process defined in Account A look like:
---
final SimplePipelineOptions options =
PipelineOptionsFactory.fromArgs(args).as(SimplePipelineOptions.class);
final Pipeline pipeline = Pipeline.create(options);
pipeline.apply(
SomeDataIO.read()
.withZookeeperQuorum(options.getZookeeperQuorum())
.withDataVersion(UUID.fromString(options.getVersionId()),
options.getDataVersion())
.withView(DictionaryModelView
.create(MODEL, new ProcessSomeData())));
Join.innerJoin(someData, someData)
.apply(Values.create())
.apply(ParDo.of(new ExtractSomeData()))
.apply(FileIO.<String, KV<String, String>>writeDynamic()
.by(KV::getKey)
.via(fn, AvroIO.sink(AVFeature.class))
.withNaming("MyTransformation.avro")
.to("S3://sone_store")
.withNumShards(1)
.withDestinationCoder(StringUtf8Coder.of()));
pipeline.run().waitUntilFinish();
---
The class ProcessSomeData is responsible to extract some data from our
persistence layer and process it.
In test, running from Account B, all work fine, we can produce the AVRO
File (34 KB) and store the file into Account B S3 store ->
S3://some_store
But running in the cloud starting the process from Account A, then we
lost the file (MyTransformation.avro, 0 bytes). -> File has not been
copied.
AWS S3 configuration from Account B give full access to Account A.
1. Some idea what goes wrong?
2. Maybe FileIO.Write.to(...) is not able to store data between AWS
cross accounts?
3. Should I create my self a java client to store in Account B?
4. Can FileIO copy 0 bytes file?
Any help is appreciate.
Many thanks in advance !
Carlos