[
https://issues.apache.org/jira/browse/NIFI-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093290#comment-18093290
]
Sönke Liebau commented on NIFI-16067:
-------------------------------------
We opened a PR for this: https://github.com/apache/nifi/pull/11387
> PutIcebergRecord maps fields by ordinal position instead of column name
> -----------------------------------------------------------------------
>
> Key: NIFI-16067
> URL: https://issues.apache.org/jira/browse/NIFI-16067
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.9.0, 2.10.0
> Reporter: Sönke Liebau
> Priority: Major
>
> h3. Description
> PutIcebergRecord (added in NIFI-15062) doesn't match columns by name. It
> writes each
> record's values into the Iceberg table columns by position, and the position
> it uses is
> the one from the *incoming* record schema, not the table schema. So the
> mapping is wrong
> whenever the input field order doesn't line up with the table's column order.
> If the record has fewer fields than the table, the missing columns aren't set
> to NULL
> either - the values just shift left into whatever table column happens to sit
> at that
> position. Sometimes that lands on an incompatible type and you get a
> ClassCastException,
> which is the good outcome because at least it fails loudly. When the types
> happen to line
> up, the row is written with values in the wrong columns and nothing complains.
> This gets worse with schema evolution: as soon as you add a column, the
> producer and the
> table no longer agree on the column set, and existing flows start silently
> writing data
> into the wrong place.
> h3. Steps to reproduce
> Create an Iceberg table whose column order differs from the incoming record
> (or add a
> new column so the record has fewer fields than the table), then push records
> through
> PutIcebergRecord into it. Depending on the type alignment you'll either see a
> ClassCastException or rows written into the wrong columns.
> h3. Root cause
> {{DelegatedRecord.get(int position)}} looks the position up against the input
> record
> schema instead of the Iceberg struct the wrapper already holds:
> nifi-iceberg-processors/.../record/DelegatedRecord.java, lines 78-82:
> {code:java}
> @Override
> public Object get(final int position) {
> final RecordField recordField = record.getSchema().getField(position); //
> input schema
> return record.getValue(recordField);
> }
> {code}
> The Iceberg write path walks the table struct position by position and calls
> `get( i )`
> for column i, but '{{{}get( i )'{}}} hands back input field i, so column i
> ends up with input
> column i's value no matter what it's named. The struct is only exposed through
> {{struct()}} and never used for the actual lookup. `{{{}get(int, Class)`{}}}
> just delegates to `{{{}get(int)`{}}}, so it's wrong too.
> h3. Fix
> Look the field up by name instead of id, something along the lines below:
> {code:java}
> @Override
> public Object get(final int position) {
> final Types.NestedField field = struct.fields().get(position);
> return record.getValue(field.name());
> }
> {code}
> We are happy to open a PR for this if the issue is confirmed.
> h3. Related
> NIFI-14162 and NIFI-14614 are the same ordinal-vs-name problem in QueryRecord
> (both
> fixed for 2.7.0). This is the Iceberg-bundle version of it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)