Re: FetchSFTP vs GetSFTP

Ryan Ward Tue, 31 Oct 2017 12:44:24 -0700

Joe/Bryan Thanks!

I believe the one specific file per concurrent task/connection (and too
many threads) is the issue I have we have a lot of small files and often
times backed up . I'm going to drop the task count to take advantage of the
pooling. Is it possible to have Fetch do batches vs a single file? Would
that improve throughput? Also is that 10 seconds configurable?


Some background: I'm converting 2 single nodes into a 5 node cluster and
trying to figure out the best approach.

Thanks again!



On Tue, Oct 31, 2017 at 2:56 PM, Bryan Bende <[email protected]> wrote:

> Ryan,
>
> Personally I don't have experience running these processors at scale,
> but from a code perspective they are fundamentally different...
>
> GetSFTP is a source processor, meaning is not being fed by an upstream
> connection, so when it executes it can create a connection and
> retrieve up to max-selects during that one execution.
>
> FetchSFTP is being told to fetch one specific file, typically through
> attributes on incoming flow files, so the concept of max-selects
> doesn't really apply because there is only thing to select during an
> execution of the processor.
>
> FetchSFTP does employ connection pooling behind the scenes such that
> it will keep open a connection for each concurrent task, as long as
> each connection continues to be used with in 10 seconds.
>
> -Bryan
>
>
> On Tue, Oct 31, 2017 at 11:43 AM, Joe Witt <[email protected]> wrote:
> > Ryan - dont know the code specifics behind FetchSFTP off-hand but i
> > can confirm there are users at that range for it.
> >
> > Thanks
> >
> > On Tue, Oct 31, 2017 at 11:38 AM, Ryan Ward <[email protected]>
> wrote:
> >> I've found that on a single node getSFTP is able to pull more files off
> a
> >> remote server than Fetch in a cluster. I noticed Fetch doesn't have a
> max
> >> selects so it is requiring way more connections (one per file?) and
> >> concurrent threads to keep up.
> >>
> >> Was wondering if anyone is using List/Fetch at scale? In the multi TB's
> a
> >> day range?
> >>
> >> Thanks,
> >> Ryan
>

Re: FetchSFTP vs GetSFTP

Reply via email to