Joe/Bryan Thanks! I believe the one specific file per concurrent task/connection (and too many threads) is the issue I have we have a lot of small files and often times backed up . I'm going to drop the task count to take advantage of the pooling. Is it possible to have Fetch do batches vs a single file? Would that improve throughput? Also is that 10 seconds configurable?
Some background: I'm converting 2 single nodes into a 5 node cluster and trying to figure out the best approach. Thanks again! On Tue, Oct 31, 2017 at 2:56 PM, Bryan Bende <[email protected]> wrote: > Ryan, > > Personally I don't have experience running these processors at scale, > but from a code perspective they are fundamentally different... > > GetSFTP is a source processor, meaning is not being fed by an upstream > connection, so when it executes it can create a connection and > retrieve up to max-selects during that one execution. > > FetchSFTP is being told to fetch one specific file, typically through > attributes on incoming flow files, so the concept of max-selects > doesn't really apply because there is only thing to select during an > execution of the processor. > > FetchSFTP does employ connection pooling behind the scenes such that > it will keep open a connection for each concurrent task, as long as > each connection continues to be used with in 10 seconds. > > -Bryan > > > On Tue, Oct 31, 2017 at 11:43 AM, Joe Witt <[email protected]> wrote: > > Ryan - dont know the code specifics behind FetchSFTP off-hand but i > > can confirm there are users at that range for it. > > > > Thanks > > > > On Tue, Oct 31, 2017 at 11:38 AM, Ryan Ward <[email protected]> > wrote: > >> I've found that on a single node getSFTP is able to pull more files off > a > >> remote server than Fetch in a cluster. I noticed Fetch doesn't have a > max > >> selects so it is requiring way more connections (one per file?) and > >> concurrent threads to keep up. > >> > >> Was wondering if anyone is using List/Fetch at scale? In the multi TB's > a > >> day range? > >> > >> Thanks, > >> Ryan >
