[ 
https://issues.apache.org/jira/browse/BEAM-7613?focusedWorklogId=274671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-274671
 ]

ASF GitHub Bot logged work on BEAM-7613:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Jul/19 06:33
            Start Date: 10/Jul/19 06:33
    Worklog Time Spent: 10m 
      Work Description: davidak09 commented on issue #8923: [BEAM-7613] 
HadoopFileSystem can work with more than one cluster.
URL: https://github.com/apache/beam/pull/8923#issuecomment-509928827
 
 
   I can confirm that this solution works in our scenario. :+1: 
   
   What we do:
   - we have a job which writes to 2 HDFS outputs
   - the first main output is on `hdfs://cluster1`
   - the second output is on `hdfs://cluster2`
   - both clusters are defined in a single hadoop configuration, 
`hdfs://cluster1` is defined as hadoop property `fs.defaultFS`
   
   Without this fix we get an exception when we try to write to HDFS on 
`cluster2`:
   ```
   org.apache.beam.sdk.util.UserCodeException: 
java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/some_path, 
expected: hdfs://cluster1
           at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
           at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
           at 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:213)
           at 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:175)
           ...
   ```
   With this fix everything works just fine.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 274671)
    Time Spent: 1.5h  (was: 1h 20m)

> HadoopFileSystem can be only used with fs.defaultFS
> ---------------------------------------------------
>
>                 Key: BEAM-7613
>                 URL: https://issues.apache.org/jira/browse/BEAM-7613
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-hadoop-file-system
>    Affects Versions: 2.13.0
>            Reporter: David Moravek
>            Assignee: David Moravek
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> _HadoopFileSystem_ creates underlying _FileSystem_ (one from 
> org.apache.hadoop) instance during it's construction. Single _FileSystem_ 
> instance is tied to a particular cluster (scheme + authority pair). In case 
> we want to talk to another cluster, this fail due to _FileSystem#checkPath_.
>  
> This can be fixed by using _FileSystem#get(java.net.URI, 
> org.apache.hadoop.conf.Configuration)_ instead of 
> _FileSystem#newInstance(org.apache.hadoop.conf.Configuration)_{{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to