[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are reading part files from HDFS

ASF GitHub Bot (JIRA) Fri, 06 Jan 2017 06:57:23 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804699#comment-15804699
 ]


ASF GitHub Bot commented on NIFI-2859:
--------------------------------------

Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1383#discussion_r94956675
  
    --- Diff: 
nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java
 ---
    @@ -176,7 +176,7 @@ private HDFSListing deserialize(final String 
serializedState) throws JsonParseEx
     
             // Build a sorted map to determine the latest possible entries
             for (final FileStatus status : statuses) {
    -            if (status.getPath().getName().endsWith("_COPYING_")) {
    +            if (status.getPath().getName().endsWith("_COPYING_") || 
status.getPath().getName().startsWith(".")) {
    --- End diff --
    
    @bbende Yes you're right! Otherwise there is the following in GetHDFS:
    ````java
    public static final PropertyDescriptor IGNORE_DOTTED_FILES = new 
PropertyDescriptor.Builder()
        .name("Ignore Dotted Files")
        .description("If true, files whose names begin with a dot (\".\") will 
be ignored")
        .required(true)
        .allowableValues("true", "false")
        .defaultValue("true")
        .build();
    ````
    But the filter property is much better. I'll update the PR.


> List + Fetch HDFS processors are reading part files from HDFS
> -------------------------------------------------------------
>
>                 Key: NIFI-2859
>                 URL: https://issues.apache.org/jira/browse/NIFI-2859
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Mahesh Nayak
>            Assignee: Pierre Villard
>
> Create the following ProcessGroups
> GetFile --> PutHdfs --> PutFile
> ListHDFS --> FetchHdfs --> putFile
> 2. Now start both the processGroups
> 3. Write lots of files into HDFS so that ListHDFS keeps listing and FetchHdfs 
> fetches.
> 4. An exception is thrown because the processor reads the part file from the 
> putHdfs folder
> {code:none}
> java.io.FileNotFoundException: File does not exist: 
> /tmp/HDFSProcessorsTest_visjJMcHORUwigw/.ycnVSpBOzEaoTWk_7f37d5af-d4a4-4521-b60d-c3c11ae19669
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
> {code}
> Note that eventually the file is copied to the output successfully, but at 
> the same time there are some files in the failure/comms failure relationship



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are reading part files from HDFS

Reply via email to