[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351692#comment-15351692
 ] 

Amit Sela commented on BEAM-59:
-------------------------------

I've recently tried to work with IOChannelFactory and while I don't have 
anything smart enough to say towards a solution, I hope that providing my point 
of view will contribute somehow. 
As mentioned here: https://github.com/apache/incubator-beam/pull/539 the 
default behaviour of validate prevents me from using TextIO with HDFS without 
explicitly stating withoutValidation() - it only supports File or GS - HDFS is 
extremely important when talking about Apache's Hadoop eco-system.
When trying to help people run the Spark runner with GS, I found out I need to 
register it with IOChannelUtils..
Those problems are because the Spark runner still doesn't implement the 
primitive Read.from, but even once this happens, there is still room for 
higher-level abstraction such as TextIO, JsonIO, ParquetIO, etc. and I think 
they should be as "translatable" as possible, with minimal constraints. Not all 
runner authors can easily influence the runner, and sometimes a runner will 
work better with it's own implementation.
I guess what I'm trying to say is that there is a delicate balance between 
having a robust IO abstraction and making runner author's life easier ;-)

Hope this helps, and hope to go into this deeper sometime soon..

> IOChannelFactory rethinking/redesign
> ------------------------------------
>
>                 Key: BEAM-59
>                 URL: https://issues.apache.org/jira/browse/BEAM-59
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core, sdk-java-gcp
>            Reporter: Daniel Halperin
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to