I have a NiFilosophical question that came up when I had a GetSFTP processor running to a back-pressured connection.
My GetSFTP is configured with max selects = 100, and the files in the remote directory are nearly 1GB each. The queue has a backpressure of 2GB, and I assumed each run of GetSFTP would stop feeding files once it hit backpressure. I was initially puzzled when I started periodically seeing huge backlogs (71GB) on each worker in the cluster in this particular queue, until I looked at the queued count/bytes stats (very useful tool, btw): Queued bytes statistics <https://imagebin.ca/v/301KDHEa1lCk> Queued count statistics <https://imagebin.ca/v/301JqnUcGXLF> Now it's evident that GetSFTP continues to emit files until it hits the max selects, regardless of backpressure. I think I understand why backpressure couldn't necessarily trump this behavior (e.g., what if a processor needed to emit a query result set in batches.. what would you do with the flow files it wanted to emit if you suddenly hit backpressure?) So my questions are: - Do you think it's the user's responsibility to be aware of cases when backpressure is overridden by a processor's implementation? I think this is important to understand, because backpressure is usually in place to prevent a full disk, which is a fairly critical requirement. - Is there something we can do to document this so it's more universally understood? - Perhaps the GetSFTP Max Selects property can indicate that it will override backpressure? In which case, are there other processors that would need similar documentation? - Or do we want a more universal approach, like putting this caveat in the general documentation? Joe -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
