[
https://issues.apache.org/jira/browse/NIFI-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-16011:
------------------------------
Description:
We are consistently seeing system test failures. Looking at the logs from
Github Actions, it appears that LoadBalanceIT is always the first one to fail,
with the issue then cascading. It seems that the end of the
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for
each of the 100 expected FlowFiles, and this then gets replicated across the
cluster.
This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.
That test can be tightened up by producing 20 FlowFiles instead of 100. This
will reduce the number of requests by 5x, giving us much more breathing room.
was:
We are consistently seeing system test failures. Looking at the logs from
Github Actions, it appears that LoadBalanceIT is always the first one to fail,
with the issue then cascading. It seems that the end of the
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for
each of the 100 expected FlowFiles, and this then gets replicated across the
cluster.
This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.
That test can be tightened up to perform a listing once and just look at
FlowFile Summaries instead of fetching the full FlowFile each time.
> Repeated system test failures caused by LoadBalanceIT
> -----------------------------------------------------
>
> Key: NIFI-16011
> URL: https://issues.apache.org/jira/browse/NIFI-16011
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> We are consistently seeing system test failures. Looking at the logs from
> Github Actions, it appears that LoadBalanceIT is always the first one to
> fail, with the issue then cascading. It seems that the end of the
> LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for
> each of the 100 expected FlowFiles, and this then gets replicated across the
> cluster.
> This, in turn, causes connection pool exhaustion, resulting in
> {code:java}
> IOException: RST_STREAM received {code}
> Which comes back as an HTTP 500 error.
> That test can be tightened up by producing 20 FlowFiles instead of 100. This
> will reduce the number of requests by 5x, giving us much more breathing room.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)