[ 
https://issues.apache.org/jira/browse/NIFI-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367093#comment-15367093
 ] 

Andrew Hulbert commented on NIFI-1438:
--------------------------------------

I believe I'm noticing a similar issue. I would like to simply aggregate an 
hour-worth of files into a single file to write to disk and I don't seem to be 
achieving that behavior or else my processor is incorrectly configured. Either 
way its not as obvious how to do this as perhaps it could be. I could always 
create the exactly behavior...(i.e. bin/concat files in order for x amount of 
time) this in a new processor but was hoping there was an easier way.

> Unexpected results using MergeProcessor 
> ----------------------------------------
>
>                 Key: NIFI-1438
>                 URL: https://issues.apache.org/jira/browse/NIFI-1438
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>         Environment: OSX 10.10.5, Java 8u45
>            Reporter: Josh Harrison
>         Attachments: NIFI-1438-template.xml, nifi-merge-problem.xml, 
> nifi-problem.tgz
>
>
> Hello, I'm opening a ticket in reference to the stack overflow question I had 
> at 
> http://stackoverflow.com/questions/34958347/mergecontent-with-nifi-inconsistent-length
>  
> To summarize, despite Aldrin's help, I have been unable to get the expected 
> merge behavior out of a template like the one attached, ingesting data like 
> is attached. 
> The goal is to ingest all of the zips in /tmp/nifidemo/source, extract the 
> zip files contained therein, each line being a json object. With json 
> routing, I extract and route for further processing ONLY items where the 
> "tags" item contains the tag "xyz".
> These routed files should be aggregated by "mergeContent" into a bucket with, 
> at minimum, 1000 lines – or after being starved for 30 seconds, whatever 
> occurs first.
> The behavior observed in my real template is replicated in this example – 
> merge content appears to be routing to buckets based on the original file 
> name, and not aggregating 1000 lines at a time as expected. Within a few 
> seconds of the template being run, many files are written with unexpected 
> line counts.
> More confusingly, this isn't a consistent pattern - files may be run 
> repeatedly and do not generate the same number of lines in the result each 
> time.
> The content of the input files was randomly generated so that approximately 
> 10% of the objects would contain the tag "xyz" (5000 lines in each input 
> file, there should be approximately 500 lines of – there are result files 
> that contain over 400 lines, but many contain 15-30 lines. There are also a 
> number of files with a "uuid.json" style name, all containing one line. 
> The attached contains a generic template that replicates the problem – it 
> seems to throw some errors but they don't appear to be related to the problem 
> I'm working on (and my real template doesn't throw the failures, but still 
> exhibits the same behavior).
> I am running Nifi 0.4.1 on a Mac OSX 10.10.5 system and JRE 8u45.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to