[
https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123047#comment-16123047
]
Guillaume Balaine commented on BEAM-2500:
-----------------------------------------
Thanks, that's fine really, the only trouble was that I had to dig in some
example code to find it out because no stacktraces pop in Beam. It's just that
resolving a ResourceId with such a path from another folder gives you an
incomplete URI, where the base path is truncated like :
(s3a://mybucket/myfolder/somefilename.fmt).resolve(somefilename-12:30-13:30.fmt)
-> ResourceId{URI{somefilename-12:30-13:30.fmt}}
while
(s3a://mybucket/myfolder/somefilename.fmt).resolve(somefilename-12.30-13.30.fmt)
-> ResourceId{URI{instead of
s3a://mybucket/myfolder/somefilename-12.30-13.30.fmt}}
so people need to be aware of their file name policies in beam.
On another note, reads don't work because S3 input streams don't implement
ByteBufferReadable as you mentionned here
https://stackoverflow.com/questions/44792884/apache-beam-unable-to-read-text-file-from-s3-using-hadoop-file-system-sdk
so I guess fixing that would be enough to resolve this issue.
> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
> Key: BEAM-2500
> URL: https://issues.apache.org/jira/browse/BEAM-2500
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-extensions
> Reporter: Luke Cwik
> Priority: Minor
>
> Note that this is for providing direct integration with S3 as an Apache Beam
> FileSystem.
> There is already support for using the Hadoop S3 connector by depending on
> the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with
> a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2:
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)