[
https://issues.apache.org/jira/browse/NIFI-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17942275#comment-17942275
]
Pierre Villard commented on NIFI-14453:
---------------------------------------
So the number of records in the configuration of the processor itself is more
like how many records we put together before appending to the stream. But we
would have all the records of the flowfile sent to the same stream. So if you
have one flowfile of 1000 records, it'd be one stream, and every 20 records we
would append to the stream (which means that records are available for querying
in BigQuery). If you don't really care about the data being immediately
available for query, you could use the batch transfer type.
Regarding Concurrent Tasks, the processor is annotated as not thread safe so
you cannot change the number of concurrent tasks. Multiplexing is possible with
the SDK but would also require code changes in the processor to properly
leverage it. This is worth a follow up Jira for this improvement in case
someone can get to it.
> PutBigQuery Creates too many active streams
> -------------------------------------------
>
> Key: NIFI-14453
> URL: https://issues.apache.org/jira/browse/NIFI-14453
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 2.3.0
> Reporter: Satya Vadapalli
> Assignee: Pierre Villard
> Priority: Major
> Attachments: image-2025-04-09-11-38-05-266.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *The Issue:*
> We recently migrated from Nifi 1.x to 2.x. We've been having an issue with
> PutBigQuery processor. It used to work well on the older
> version(PutBigQueryStreaming), because I believe it used the BigQuery REST
> API vs the new one uses the Storage API. I'm running into an issue where the
> processor is opening over 10k streams, instead of reusing the existing
> stream. Here's the error message.
>
> {code:java}
> PutBigQuery[id=01bc2d0f-0196-1000-0000-0000541e88b5] Processing halted:
> yielding [1 sec]: com.google.api.gax.rpc.FailedPreconditionException:
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many
> active streams with too little traffic. Please send more traffic through
> existing streams or finalize unused streams.
> Table=1087270590813:xxxxxx.xxxxxxxx. ActiveStreamCount=10139.
> ActualPerStreamBytesPerSec=0.0333333.
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already
> terminated all the traffic, the error will go away in two hours. To avoid
> this problem in the long term, please use less streams for the same amount of
> data Entity: projects/xxx/datasets/xxx/tables/xxx - Caused by:
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many
> active streams with too little traffic. Please send more traffic through
> existing streams or finalize unused streams.
> Table=1087270590813:xxxxxxx.xxxxxxxxxx. ActiveStreamCount=10139.
> ActualPerStreamBytesPerSec=0.0333333.
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already
> terminated all the traffic, the error will go away in two hours. To avoid
> this problem in the long term, please use less streams for the same amount of
> data Entity: projects/xxx/datasets/xxx/tables/xxx
> {code}
> *What is Expected:*
> The PutBigQuery should be able to stream data into Bigquery, with a single
> stream, using the Storage Write API as described in the document -
> [https://cloud.google.com/bigquery/docs/write-api-streaming]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)