[
https://issues.apache.org/jira/browse/NIFI-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17942271#comment-17942271
]
Satya Vadapalli commented on NIFI-14453:
----------------------------------------
Hi Pierre - thanks for promptly looking into this. The PutBigQuery processor
has a parameter "Minimum Number of Records" that basically I believe combines
the records from flow files into a Bin before streaming it to Bigquery. Isn't
that the same thing merge records would do? Does Merge records work differently
than this parameter? If the suggestion is to use merge records instead, what
should this parameter be set to? I think the PR you're going to put in, along
with merge records might work for us. Is there a release schedule I could
follow to get an update when your PR goes in?
!image-2025-04-09-11-38-05-266.png!
Also I wanted to bring up another bug with the PutBigquery processor. I'm
unable to change the number of concurrent tasks on this processor. It allows me
to change and apply but when I open it back it's back to 1.
> PutBigQuery Creates too many active streams
> -------------------------------------------
>
> Key: NIFI-14453
> URL: https://issues.apache.org/jira/browse/NIFI-14453
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 2.3.0
> Reporter: Satya Vadapalli
> Assignee: Pierre Villard
> Priority: Major
> Attachments: image-2025-04-09-11-38-05-266.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *The Issue:*
> We recently migrated from Nifi 1.x to 2.x. We've been having an issue with
> PutBigQuery processor. It used to work well on the older
> version(PutBigQueryStreaming), because I believe it used the BigQuery REST
> API vs the new one uses the Storage API. I'm running into an issue where the
> processor is opening over 10k streams, instead of reusing the existing
> stream. Here's the error message.
>
> {code:java}
> PutBigQuery[id=01bc2d0f-0196-1000-0000-0000541e88b5] Processing halted:
> yielding [1 sec]: com.google.api.gax.rpc.FailedPreconditionException:
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many
> active streams with too little traffic. Please send more traffic through
> existing streams or finalize unused streams.
> Table=1087270590813:xxxxxx.xxxxxxxx. ActiveStreamCount=10139.
> ActualPerStreamBytesPerSec=0.0333333.
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already
> terminated all the traffic, the error will go away in two hours. To avoid
> this problem in the long term, please use less streams for the same amount of
> data Entity: projects/xxx/datasets/xxx/tables/xxx - Caused by:
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many
> active streams with too little traffic. Please send more traffic through
> existing streams or finalize unused streams.
> Table=1087270590813:xxxxxxx.xxxxxxxxxx. ActiveStreamCount=10139.
> ActualPerStreamBytesPerSec=0.0333333.
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already
> terminated all the traffic, the error will go away in two hours. To avoid
> this problem in the long term, please use less streams for the same amount of
> data Entity: projects/xxx/datasets/xxx/tables/xxx
> {code}
> *What is Expected:*
> The PutBigQuery should be able to stream data into Bigquery, with a single
> stream, using the Storage Write API as described in the document -
> [https://cloud.google.com/bigquery/docs/write-api-streaming]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)