[
https://issues.apache.org/jira/browse/NIFI-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036916#comment-16036916
]
ASF GitHub Bot commented on NIFI-4008:
--------------------------------------
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/1891
As @joewitt suggested on JIRA ticket, even if a message contains multiple
records, the message probably should be treated as one. In order to do so, they
will define a schema that has a compound array object as the top level object.
I agree with that.
However, if an user is going to split it at downstream, they need another
schema to treat each items within the top-level array as individual record. If
ConsumeKafkaRecord can do that as it consumes messages, it may be better.
Ideally, I prefer ConsumeKafkaRecord to support both ways, by parsing a message
as a message, or split each records in a message.
I intended to support both mode in this PR. Please let me know if we have
strong reason not to do this.
By the way, I did try to setup a schema in AvroSchemaRegistry as follows:
```
{
"type": "array",
"items": {
"type": "record", "name": "temp",
"fields": [
{"name": "value", "type": "string"}
]
}
}
```
While this is a valid schema in Avro, current NiFi AvroSchemaRegistry
doesn't allow this as it requires the top level object to be a record. I got
"Not a record" validation error when [AvroTypeUtil tried to call
getFields](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L218):

I think this should be addressed by another JIRA.
> ConsumeKafkaRecord_0_10 assumes there is always one Record in a message
> -----------------------------------------------------------------------
>
> Key: NIFI-4008
> URL: https://issues.apache.org/jira/browse/NIFI-4008
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.2.0
> Reporter: Koji Kawamura
> Assignee: Koji Kawamura
>
> ConsumeKafkaRecord_0_10 uses ConsumerLease underneath, and it [assumes there
> is one Record available in a consumed
> message|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kafka-bundle/nifi-kafka-0-10-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java#L434]
> retrieved from a Kafka topic.
> But in fact, a message can contain 0 or more records in it. For example, with
> a record schema shown below:
> {code}
> {
> "type": "record",
> "name": "temp",
> "fields" : [
> {"name": "value", "type": "string"}
> ]
> }
> {code}
> Multiple records can be sent within a single message, e.g. using JSON:
> {code}
> [{"value": "a"}, {"value": "b"}, {"value": "c"}]
> {code}
> But ConsumeKafkaRecord only outputs the first record:
> {code}
> [{"value": "a"}]
> {code}
> Also, if a message doesn't contain any record in it, the processor fails with
> NullPointerException.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)