Ryan, Personally I don't have experience running these processors at scale, but from a code perspective they are fundamentally different...
GetSFTP is a source processor, meaning is not being fed by an upstream connection, so when it executes it can create a connection and retrieve up to max-selects during that one execution. FetchSFTP is being told to fetch one specific file, typically through attributes on incoming flow files, so the concept of max-selects doesn't really apply because there is only thing to select during an execution of the processor. FetchSFTP does employ connection pooling behind the scenes such that it will keep open a connection for each concurrent task, as long as each connection continues to be used with in 10 seconds. -Bryan On Tue, Oct 31, 2017 at 11:43 AM, Joe Witt <[email protected]> wrote: > Ryan - dont know the code specifics behind FetchSFTP off-hand but i > can confirm there are users at that range for it. > > Thanks > > On Tue, Oct 31, 2017 at 11:38 AM, Ryan Ward <[email protected]> wrote: >> I've found that on a single node getSFTP is able to pull more files off a >> remote server than Fetch in a cluster. I noticed Fetch doesn't have a max >> selects so it is requiring way more connections (one per file?) and >> concurrent threads to keep up. >> >> Was wondering if anyone is using List/Fetch at scale? In the multi TB's a >> day range? >> >> Thanks, >> Ryan
