[ 
https://issues.apache.org/jira/browse/NIFI-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Youtsey updated NIFI-6278:
---------------------------------
    Description: 
The processor will hang on a blocking read indefinitely and thus stop ingesting 
data. This is typically caused by a heavily loaded listening node with many 
incoming Post requests. When a Post request times out on the sending node, the 
listening node has no knowledge of the timeout since the connections are reused 
on the sending side, thus never closed. The result is the ListenHTTP will block 
on the read. This has been seen on production systems when using the Max Data 
Rate property, but I cannot verify that it has occurred without using that 
property.

The LeakyBucketStreamThrottler needs a redesign. Rather than incorporating the 
reads from the socket into the Executor's Runnable (Drain), do the reads on the 
incoming connection's thread prior to making the determination to throttle. 
This will accomplish 2 things:
 # It will eliminate the need to thread context switch for every buffer being 
read;
 # It will reduce the amount of time needed to make the determination to 
throttle, and thus give a much more accurate rate. Incorporating the socket 
read into the timed thread creates a high degree of inaccuracy due to the 
variations in the loading of the listening server, the loading of the client, 
and the congestion/bandwidth of the networks.

In essence, the Runnable should only be resetting the total bytes read for that 
1 sec. interval.

Also, would like to change the "Max Data to Receive per Second" config prop to 
"Max Data Rate (5 min)". All the stat's are on a 5 min interval, so this avoids 
one more calculation.
 # Propose changing the existing prop to a 'dynamic' prop with a new 
description, "Deprecated - use Max Data Rate"
 # Add a getSupportedDynamicProperties method which simply returns the existing 
PropertyDescriptor when the 'propertyDescriptorName' matches so that the 
existing property will be handled by the framework yet able to be removed from 
the processor's configuration.
 # In 'createHttpServer' (the onScheduled method) check for the existing prop 
and if set, log a warning message and use it's value IF the new property is not 
set.

  was:
The processor will hang on a blocking read indefinitely and thus stop ingesting 
data. This is typically caused by a heavily loaded listening node with many 
incoming Post requests. When a Post request times out on the sending node, the 
listening node has no knowledge of the timeout since the connections are reused 
on the sending side, thus never closed. The result is the ListenHTTP will block 
on the read. This has been seen on production systems when using the Max Data 
Rate property, but I cannot verify that it has occurred without using that 
property.

The LeakyBucketStreamThrottler needs a redesign. Rather than incorporating the 
reads from the socket into the Executor's Runnable (Drain), do the reads on the 
incoming connection's thread prior to making the determination to throttle. 
This will accomplish 2 things:
 # It will eliminate the need to thread context switch for every buffer being 
read;
 # It will reduce the amount of time needed to make the determination to 
throttle, and thus give a much more accurate rate. Incorporating the socket 
read into the timed thread creates a high degree of inaccuracy due to the 
variations in the loading of the listening server, the loading of the client, 
and the congestion/bandwidth of the networks.

In essence, the Runnable should only be computing the total bytes read for that 
1 sec. interval.

Also, would like to change the "Max Data to Receive per Second" config prop to 
"Max Data Rate (5 min)". All the stat's are on a 5 min interval, so this avoids 
one more calculation.
 # Propose changing the existing prop to a 'dynamic' prop with a new 
description, "Deprecated - use Max Data Rate"
 # Add a getSupportedDynamicProperties method which simply returns the existing 
PropertyDescriptor when the 'propertyDescriptorName' matches so that the 
existing property will be handled by the framework yet able to be removed from 
the processor's configuration.
 # In 'createHttpServer' (the onScheduled method) check for the existing prop 
and if set, log a warning message and use it's value IF the new property is not 
set.


> ListenHTTP - Improve throttling and set idle timeout
> ----------------------------------------------------
>
>                 Key: NIFI-6278
>                 URL: https://issues.apache.org/jira/browse/NIFI-6278
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Steven Youtsey
>            Priority: Major
>
> The processor will hang on a blocking read indefinitely and thus stop 
> ingesting data. This is typically caused by a heavily loaded listening node 
> with many incoming Post requests. When a Post request times out on the 
> sending node, the listening node has no knowledge of the timeout since the 
> connections are reused on the sending side, thus never closed. The result is 
> the ListenHTTP will block on the read. This has been seen on production 
> systems when using the Max Data Rate property, but I cannot verify that it 
> has occurred without using that property.
> The LeakyBucketStreamThrottler needs a redesign. Rather than incorporating 
> the reads from the socket into the Executor's Runnable (Drain), do the reads 
> on the incoming connection's thread prior to making the determination to 
> throttle. This will accomplish 2 things:
>  # It will eliminate the need to thread context switch for every buffer being 
> read;
>  # It will reduce the amount of time needed to make the determination to 
> throttle, and thus give a much more accurate rate. Incorporating the socket 
> read into the timed thread creates a high degree of inaccuracy due to the 
> variations in the loading of the listening server, the loading of the client, 
> and the congestion/bandwidth of the networks.
> In essence, the Runnable should only be resetting the total bytes read for 
> that 1 sec. interval.
> Also, would like to change the "Max Data to Receive per Second" config prop 
> to "Max Data Rate (5 min)". All the stat's are on a 5 min interval, so this 
> avoids one more calculation.
>  # Propose changing the existing prop to a 'dynamic' prop with a new 
> description, "Deprecated - use Max Data Rate"
>  # Add a getSupportedDynamicProperties method which simply returns the 
> existing PropertyDescriptor when the 'propertyDescriptorName' matches so that 
> the existing property will be handled by the framework yet able to be removed 
> from the processor's configuration.
>  # In 'createHttpServer' (the onScheduled method) check for the existing prop 
> and if set, log a warning message and use it's value IF the new property is 
> not set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to