[
https://issues.apache.org/jira/browse/NIFI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420010#comment-15420010
]
ASF GitHub Bot commented on NIFI-2553:
--------------------------------------
Github user bbende commented on a diff in the pull request:
https://github.com/apache/nifi/pull/843#discussion_r74689871
--- Diff:
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
---
@@ -286,8 +310,10 @@ HdfsResources resetHDFSResources(String
configResources, String dir, ProcessCont
}
}
+ final Path workingDir = fs.getWorkingDirectory();
getLogger().info("Initialized a new HDFS File System with
working dir: {} default block size: {} default replication: {} config: {}",
- new Object[] { fs.getWorkingDirectory(),
fs.getDefaultBlockSize(new Path(dir)), fs.getDefaultReplication(new Path(dir)),
config.toString() });
+ new Object[]{workingDir,
fs.getDefaultBlockSize(workingDir), fs.getDefaultReplication(workingDir),
config.toString()});
--- End diff --
The main reason I wanted to go with the working directory is because its
not always possible to know what the value of "Directory" is going to be during
an OnScheduled method. The main example being PutHDFS will often have
"Directory" set to an expression like ${hadoop.dir} that was set as a flow file
attribute by an upstream processor, every flow file could actually be a
different directory. ListHDFS and GetHDFS aren't as much of a problem because
they are source processors, but since this code is in the abstract base class,
it has to account for all of them.
So overall I figured we can use the working directory, or if we believe
that could lead to a problem then I would say we just don't need to log the
block size and replication which are causing us to need a Path instance.
> HDFS processors throwing exception from OnSchedule when directory is an
> invalid URI
> -----------------------------------------------------------------------------------
>
> Key: NIFI-2553
> URL: https://issues.apache.org/jira/browse/NIFI-2553
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.0.0, 0.7.0
> Reporter: Bryan Bende
> Assignee: Bryan Bende
> Priority: Minor
> Fix For: 1.0.0
>
>
> If you enter a directory string that results in an invalid URI, the HDFS
> processors will throw an unexpected exception from OnScheduled because of a
> logging statement on in AbstractHadoopProcessor:
> {code}
> getLogger().info("Initialized a new HDFS File System with working dir: {}
> default block size: {} default replication: {} config: {}",
> new Object[] { fs.getWorkingDirectory(),
> fs.getDefaultBlockSize(new Path(dir)), fs.getDefaultReplication(new
> Path(dir)), config.toString() });
> {code}
> An example input for the directory that can produce this problem:
> data_${literal('testing'):substring(0,4)%7D
> In addition to this, FetchHDFS, ListHDFS, GetHDFS, and PutHDFS all create new
> Path instances in their onTrigger methods from the same directory, outside of
> a try/catch which would result in throwing a ProcessException (if it got past
> the logging issue above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)