[
https://issues.apache.org/jira/browse/NIFI-12130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Bathori reassigned NIFI-12130:
-----------------------------------
Assignee: Mark Bathori
> PutIceberg: Ability to configure snapshot properties via dynamic attributes
> ---------------------------------------------------------------------------
>
> Key: NIFI-12130
> URL: https://issues.apache.org/jira/browse/NIFI-12130
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: William Dyson
> Assignee: Mark Bathori
> Priority: Minor
> Labels: iceberg
>
> *Motivation*
> Spark's implementation of Iceberg allows users to add snapshot properties,
> when writing data to an Iceberg table, using properties prefixed with
> "snapshot-property." like so:
> {{df.write}}
> {{ .option("write-format", "avro")}}
> {{ .option("snapshot-property.key", "value")}}
> {{ .insertInto("catalog.db.table") }}
> [https://iceberg.apache.org/docs/latest/spark-configuration/#write-options]
> These properties can be used to add context to Iceberg snapshots and help
> users locate snapshots in recovery scenarios.
> In fact, Spark automatically adds the application name as {_}spark.app.id{_}.
> Examples of when these properties might be useful include:
> * Recording the data source used to produce the new records
> * UUID of flow file used to update the table so it can be matched to NiFi
> provenance
> They can be queried from the snapshots metatable (feature of Iceberg).
> *Feature request*
> It would be great if we could configure PutIceberg to add these properties in
> a similar fashion (e.g. using dynamic properties of the form
> snapshot-property.*). Continuing with the comparison to Spark, it may also be
> worth automatically adding the flowfile UUID as something like
> {_}nifi.flowfile.id{_}.
> *Further details*
> I'm not entirely clued up on the Iceberg API, but it looks like these are set
> on the SnapshotUpdate (AppendFiles inherits from this class):
> [https://iceberg.apache.org/javadoc/master/org/apache/iceberg/SnapshotUpdate.html]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)