Dale,

I think an image of the flow would be useful. Or better yet, if you can, a 
template of the flow, so
that we can see all of the configuration being used.

When you said you "get stuck at around 20 MB and then NiFi moves to a crawl" 
I'm not clear on
what you are saying exactly. After you process 20 MB of the 189 MB CSV file? 
After you ingest
20 MB worth of files via the second FetchFile?

Also, which directory has 85,000 files? The first directory being polled via 
ListFile, or the directory
that you are picking up from via the second FetchFile?

Thanks
-Mark


> On May 4, 2016, at 9:01 AM, dale.chang13 <dale.chan...@outlook.com> wrote:
> 
> Joe Witt wrote
>> On May 4, 2016, at 8:56 AM, Joe Witt &lt;
> 
>> joe.witt@
> 
>> &gt; wrote:
>> 
>> Dale,
>> 
>> Where there is a fetch file there is usually a list file.  And while
>> the symptom of memory issues is showing up in fetch file i am curious
>> if the issue might actually be caused in ListFile.  How many files are
>> in the directory being listed?
>> 
>> Mark,
>> 
>> Are we using a stream friendly API to list files and do we know if
>> that API on all platforms really doing things in a stream friendly
>> way?
>> 
>> Thanks
>> Joe
> 
> So I will explain my flow first and then I will answer your question of how
> I am using ListFile and FetchFile.
> 
> To begin my process, I am ingesting a CSV file that contains a list of
> filenames. The first (and only ListFile) starts off the flow and passes it
> to the first FetchFile to retrieve the contents of the documents. Afterward,
> I use expression language (ExtractText) to extract all of the file names and
> put them as attributes to individual FlowFiles. THEN I use a second
> FetchFile (this is the processor that has trouble allocating memory) and use
> expression language to use that file name to retrieve a text document.
> 
> The CSV file (189 MB) contains metadata and path/filenames for over 200,000
> documents, and I am having trouble reading from a directory of about 85,000
> documents (second FetchFile, each document is usually a few KB). I get stuck
> at around 20 MB and then NiFi moves to a crawl.
> 
> I can give you a picture of our actual flow if you need it
> 
> 
> Mark Payne wrote
>> ListFile performs a listing using Java's File.listFiles(). This will
>> provide a list of all files in the
>> directory. I do not believe this to be related, though. Googling indicates
>> that when this error
>> occurs it is related to the ability to create a native process in order to
>> interact with the file system.
>> I don't think the issue is related to Java heap but rather available RAM
>> on the box. How much RAM
>> is actually available on the box? You mentioned IOPS - are you running in
>> a virtual cloud environment?
>> Using remote storage such as Amazon EBS?
> 
> I am running six Linux VMs on a Windows 8 machine. Three VMs (one ncm, two
> nodes) use NiFi and those VMs have 20 GB assigned to them. Looking through
> Ambari and monitoring the memory on the nodes, I have a little more than 4
> GB free RAM on the nodes. It looks like the free memory dipped severely
> during my NiFi flow, but no swap memory was used.
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9911.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to