[jira] [Updated] (NIFI-11402) PutBigQuery processor case sensitive and Append Record Count issues

Julien G. (Jira) Fri, 07 Apr 2023 07:40:09 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Julien G. updated NIFI-11402:
-----------------------------
    Description: 
The {{PutBigQuery}} processor seems to have to some issues. I detected 2 issues 
that can be quite blocking.

For the first one, if you set a hight value in the {{Append Record Count}} 
property in my case 500 000 and that you have a big flowfile (number of records 
and size, in my case 54 000 records for a size of 74MB) you will get an error 
because the message to send is too big. That is quite normal.
{code:java}
PutBigQuery[id=16da3694-c886-3b31-929e-0dc81be51bf7] Stream processing failed: 
java.lang.RuntimeException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: 
MessageSize is too large. Max allow: 10000000 Actual: 13593340
- Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: MessageSize is 
too large. Max allow: 10000000 Actual: 13593340
{code}

So you replace the value with a smaller one, but the error message remains the 
same. Even if you reduce your flowfile to a single record, you will still get 
the error. The only way to fix this is to delete the processor and readd it, 
then reduce the value of the property before running it. Seems to be an issue 
here. 
It would also be interesting to give information about the limit of the message 
sent in the processor documentation because the limit in the previous 
implementation of the {{PutBigQueryStreaming}} and {{PutBigQueryBatch}} 
processors was quite straightforward and linked to the size of the file sent. 
But now the limit is on the {{Message}} but it doesn't really correspond to the 
size of the FlowFile or the number of records in it.

The second issue occure if you are using upper case in your field name. For 
example, you have a table with the following schema:
{code:java}
timestamp | TIMESTAMP | REQUIRED
original_payload | STRING | NULLABLE
error_message | STRING | REQUIRED
error_type | STRING REQUIRED
error_subType | STRING | REQUIRED
{code}
and try to put the following event in it:
{code:java}
{
  "original_payload" : "XXXXXXXX",
  "error_message" : "XXXXXX",
  "error_type" : "XXXXXXXXXX",
  "error_subType" : "XXXXXXXXXXX",
  "timestamp" : "2023-04-07T10:31:45Z"
}
{code}
(in my case this event was in Avro)

You will get the following telling you that the required field 
{{error_subtype}} is missing:
{code:java}
Cannot convert record to message: 
com.google.protobuf.UninitializedMessageException: Message missing required 
fields: error_subtype
{code}
So to fix it, you need to change your Avro Schema and put {{error_subtype}} 
instead of {{error_subType}} in it.
BigQuery columns aren't case sensitive so it should be ok to put a field with 
upper case but it's not. In the previous implementation of the 
{{PutBigQueryStreaming}} and {{PutBigQueryBatch}}, we were able to use upper 
case in the schema fields. So it should still be the case.
{color:#DE350B}If you get this error, the flowfile will not go in the failure 
queue but just disappear.{color}

Link to the slack thread: 
https://apachenifi.slack.com/archives/C0L9VCD47/p1680866688318739

  was:
The {{PutBigQuery}} processor seems to have to some issues. I detected 2 issues 
that can be quite blocking.

For the first one, if you set a hight value in the {{Append Record Count}} 
property in my case 500 000 and that you have a big flowfile (number of records 
and size, in my case 54 000 records for a size of 74MB) you will get an error 
because the message to send is too big. That is quite normal.
{code:java}
PutBigQuery[id=16da3694-c886-3b31-929e-0dc81be51bf7] Stream processing failed: 
java.lang.RuntimeException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: 
MessageSize is too large. Max allow: 10000000 Actual: 13593340
- Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: MessageSize is 
too large. Max allow: 10000000 Actual: 13593340
{code}

So you replace the value with a smaller one, but the error message remains the 
same. Even if you reduce your flowfile to a single record, you will still get 
the error. The only way to fix this is to delete the processor and readd it, 
then reduce the value of the property before running it. Seems to be an issue 
here. 
It would also be interesting to give information about the limit of the message 
sent in the processor documentation because the limit in the previous 
implementation of the {{PutBigQueryStreaming}} and {{PutBigQueryBatch}} 
processors was quite straightforward and linked to the size of the file sent. 
But now the limit is on the {{Message}} but it doesn't really correspond to the 
size of the FlowFile or the number of records in it.

The second issue occure if you are using upper case in your field name. For 
example, you have a table with the following schema:
{code:java}
timestamp | TIMESTAMP | REQUIRED
original_payload | STRING | NULLABLE
error_message | STRING | REQUIRED
error_type | STRING REQUIRED
error_subType | STRING | REQUIRED
{code}
and try to put the following event in it:
{code:java}
{
  "original_payload" : "XXXXXXXX",
  "error_message" : "XXXXXX",
  "error_type" : "XXXXXXXXXX",
  "error_subType" : "XXXXXXXXXXX",
  "timestamp" : "2023-04-07T10:31:45Z"
}
{code}
(in my case this event was in Avro)

You will get the following telling you that the required field 
{{error_subtype}} is missing:
{code:java}
Cannot convert record to message: 
com.google.protobuf.UninitializedMessageException: Message missing required 
fields: error_subtype
{code}
So to fix it, you need to change your Avro Schema and put {{error_subtype}} 
instead of {{error_subType}} in it.
BigQuery columns aren't case sensitive so it should be ok to put a field with 
upper case but it's not. In the previous implementation of the 
{{PutBigQueryStreaming}} and {{PutBigQueryBatch}}, we were able to use upper 
case in the schema fields. So it should still be the case.

Link to the slack thread: 
https://apachenifi.slack.com/archives/C0L9VCD47/p1680866688318739


> PutBigQuery processor case sensitive and Append Record Count issues
> -------------------------------------------------------------------
>
>                 Key: NIFI-11402
>                 URL: https://issues.apache.org/jira/browse/NIFI-11402
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.18.0, 1.20.0
>            Reporter: Julien G.
>            Priority: Major
>
> The {{PutBigQuery}} processor seems to have to some issues. I detected 2 
> issues that can be quite blocking.
> For the first one, if you set a hight value in the {{Append Record Count}} 
> property in my case 500 000 and that you have a big flowfile (number of 
> records and size, in my case 54 000 records for a size of 74MB) you will get 
> an error because the message to send is too big. That is quite normal.
> {code:java}
> PutBigQuery[id=16da3694-c886-3b31-929e-0dc81be51bf7] Stream processing 
> failed: java.lang.RuntimeException: io.grpc.StatusRuntimeException: 
> INVALID_ARGUMENT: MessageSize is too large. Max allow: 10000000 Actual: 
> 13593340
> - Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: MessageSize is 
> too large. Max allow: 10000000 Actual: 13593340
> {code}
> So you replace the value with a smaller one, but the error message remains 
> the same. Even if you reduce your flowfile to a single record, you will still 
> get the error. The only way to fix this is to delete the processor and readd 
> it, then reduce the value of the property before running it. Seems to be an 
> issue here. 
> It would also be interesting to give information about the limit of the 
> message sent in the processor documentation because the limit in the previous 
> implementation of the {{PutBigQueryStreaming}} and {{PutBigQueryBatch}} 
> processors was quite straightforward and linked to the size of the file sent. 
> But now the limit is on the {{Message}} but it doesn't really correspond to 
> the size of the FlowFile or the number of records in it.
> The second issue occure if you are using upper case in your field name. For 
> example, you have a table with the following schema:
> {code:java}
> timestamp | TIMESTAMP | REQUIRED
> original_payload | STRING | NULLABLE
> error_message | STRING | REQUIRED
> error_type | STRING REQUIRED
> error_subType | STRING | REQUIRED
> {code}
> and try to put the following event in it:
> {code:java}
> {
>   "original_payload" : "XXXXXXXX",
>   "error_message" : "XXXXXX",
>   "error_type" : "XXXXXXXXXX",
>   "error_subType" : "XXXXXXXXXXX",
>   "timestamp" : "2023-04-07T10:31:45Z"
> }
> {code}
> (in my case this event was in Avro)
> You will get the following telling you that the required field 
> {{error_subtype}} is missing:
> {code:java}
> Cannot convert record to message: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: error_subtype
> {code}
> So to fix it, you need to change your Avro Schema and put {{error_subtype}} 
> instead of {{error_subType}} in it.
> BigQuery columns aren't case sensitive so it should be ok to put a field with 
> upper case but it's not. In the previous implementation of the 
> {{PutBigQueryStreaming}} and {{PutBigQueryBatch}}, we were able to use upper 
> case in the schema fields. So it should still be the case.
> {color:#DE350B}If you get this error, the flowfile will not go in the failure 
> queue but just disappear.{color}
> Link to the slack thread: 
> https://apachenifi.slack.com/archives/C0L9VCD47/p1680866688318739



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (NIFI-11402) PutBigQuery processor case sensitive and Append Record Count issues

Reply via email to