[jira] [Commented] (NIFI-3068) NiFi can not reliably support multiple HDFS clusters in the same flow

Bryan Bende (JIRA) Tue, 22 Nov 2016 12:13:45 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687745#comment-15687745
 ]


Bryan Bende commented on NIFI-3068:
-----------------------------------

Using the latest code from master (1.1-SNAPSHOT) I have been unable to 
reproduce a scenario where a PutHDFS processor writes to the wrong cluster. 

I did determine that there appears to be some shared state in the Hadoop client 
related to security. The scenario was the following:

- One PutHDFS processor writing to a kerberized HDFS
- Start a second PutHDFS processor writing to a non-secure HDFS, writes 
successfully
- The first PutHDFS processor now gets an error:

{code}
2016-11-21 22:05:43,610 ERROR [Timer-Driven Process Thread-2] 
o.apache.nifi.processors.hadoop.PutHDFS 
PutHDFS[id=01581004-7069-19ef-5ec2-87b728465117] Failed to write to HDFS due to 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]: 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]
2016-11-21 22:05:43,612 ERROR [Timer-Driven Process Thread-2] 
o.apache.nifi.processors.hadoop.PutHDFS
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[na:1.8.0_74]
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[na:1.8.0_74]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.8.0_74]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[na:1.8.0_74]
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
 ~[hadoop-common-2.7.3.jar:na]
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 ~[hadoop-common-2.7.3.jar:na]
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2110) 
~[hadoop-hdfs-2.7.3.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
 ~[hadoop-hdfs-2.7.3.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
 ~[hadoop-hdfs-2.7.3.jar:na]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-2.7.3.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
 ~[hadoop-hdfs-2.7.3.jar:na]
        at 
org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:255) 
~[nifi-hdfs-processors-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
        at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 [nifi-api-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
{code}

If the processor with the error is started and stopped it gets back to a 
working state, and both processors are working at the same time again.

I've tested adding the @RequiresInstanceClassLoading to PutHDFS which will 
guarantee that each instance of the processor has its own ClassLoader and thus 
can't share any state between instances, and this resolves the problem. 

I will attach a patch adding the annotation.

> NiFi can not reliably support multiple HDFS clusters in the same flow
> ---------------------------------------------------------------------
>
>                 Key: NIFI-3068
>                 URL: https://issues.apache.org/jira/browse/NIFI-3068
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.0.0
>            Reporter: Sam Hjelmfelt
>            Assignee: Bryan Bende
>              Labels: HDFS
>
> The HDFS configurations in PutHDFS are not respected when two (or more) 
> PutHDFS processors exist with different configurations. The second processor 
> to run will use the configurations from the first processor. This can cause 
> data to be written to the wrong cluster.
> This appears to be caused by configuration caching in 
> AbstractHadoopProcessor, which would affect all HDFS processors.
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java#L144



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-3068) NiFi can not reliably support multiple HDFS clusters in the same flow

Reply via email to