[jira] [Commented] (NIFI-14453) PutBigQuery Creates too many active streams

Pierre Villard (Jira) Wed, 09 Apr 2025 11:59:11 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17942275#comment-17942275
 ]


Pierre Villard commented on NIFI-14453:
---------------------------------------

So the number of records in the configuration of the processor itself is more 
like how many records we put together before appending to the stream. But we 
would have all the records of the flowfile sent to the same stream. So if you 
have one flowfile of 1000 records, it'd be one stream, and every 20 records we 
would append to the stream (which means that records are available for querying 
in BigQuery). If you don't really care about the data being immediately 
available for query, you could use the batch transfer type.

Regarding Concurrent Tasks, the processor is annotated as not thread safe so 
you cannot change the number of concurrent tasks. Multiplexing is possible with 
the SDK but would also require code changes in the processor to properly 
leverage it. This is worth a follow up Jira for this improvement in case 
someone can get to it.

> PutBigQuery Creates too many active streams
> -------------------------------------------
>
>                 Key: NIFI-14453
>                 URL: https://issues.apache.org/jira/browse/NIFI-14453
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.3.0
>            Reporter: Satya Vadapalli
>            Assignee: Pierre Villard
>            Priority: Major
>         Attachments: image-2025-04-09-11-38-05-266.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *The Issue:*
> We recently migrated from Nifi 1.x to 2.x. We've been having an issue with 
> PutBigQuery processor. It used to work well on the older 
> version(PutBigQueryStreaming), because I believe it used the BigQuery REST 
> API vs the new one uses the Storage API. I'm running into an issue where the 
> processor is opening over 10k streams, instead of reusing the existing 
> stream. Here's the error message. 
>  
> {code:java}
> PutBigQuery[id=01bc2d0f-0196-1000-0000-0000541e88b5] Processing halted: 
> yielding [1 sec]: com.google.api.gax.rpc.FailedPreconditionException: 
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many 
> active streams with too little traffic. Please send more traffic through 
> existing streams or finalize unused streams. 
> Table=1087270590813:xxxxxx.xxxxxxxx. ActiveStreamCount=10139. 
> ActualPerStreamBytesPerSec=0.0333333. 
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already 
> terminated all the traffic, the error will go away in two hours. To avoid 
> this problem in the long term, please use less streams for the same amount of 
> data Entity: projects/xxx/datasets/xxx/tables/xxx - Caused by: 
> io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Table has too many 
> active streams with too little traffic. Please send more traffic through 
> existing streams or finalize unused streams. 
> Table=1087270590813:xxxxxxx.xxxxxxxxxx. ActiveStreamCount=10139. 
> ActualPerStreamBytesPerSec=0.0333333. 
> RequiredPerStreamBytesPerSecForMoreStreams=300000. If you have already 
> terminated all the traffic, the error will go away in two hours. To avoid 
> this problem in the long term, please use less streams for the same amount of 
> data Entity: projects/xxx/datasets/xxx/tables/xxx
> {code}
> *What is Expected:*
> The PutBigQuery should be able to stream data into Bigquery, with a single 
> stream, using the Storage Write API as described in the document - 
> [https://cloud.google.com/bigquery/docs/write-api-streaming]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-14453) PutBigQuery Creates too many active streams

Reply via email to