Ryan,

Personally I don't have experience running these processors at scale,
but from a code perspective they are fundamentally different...

GetSFTP is a source processor, meaning is not being fed by an upstream
connection, so when it executes it can create a connection and
retrieve up to max-selects during that one execution.

FetchSFTP is being told to fetch one specific file, typically through
attributes on incoming flow files, so the concept of max-selects
doesn't really apply because there is only thing to select during an
execution of the processor.

FetchSFTP does employ connection pooling behind the scenes such that
it will keep open a connection for each concurrent task, as long as
each connection continues to be used with in 10 seconds.

-Bryan


On Tue, Oct 31, 2017 at 11:43 AM, Joe Witt <[email protected]> wrote:
> Ryan - dont know the code specifics behind FetchSFTP off-hand but i
> can confirm there are users at that range for it.
>
> Thanks
>
> On Tue, Oct 31, 2017 at 11:38 AM, Ryan Ward <[email protected]> wrote:
>> I've found that on a single node getSFTP is able to pull more files off a
>> remote server than Fetch in a cluster. I noticed Fetch doesn't have a max
>> selects so it is requiring way more connections (one per file?) and
>> concurrent threads to keep up.
>>
>> Was wondering if anyone is using List/Fetch at scale? In the multi TB's a
>> day range?
>>
>> Thanks,
>> Ryan

Reply via email to