[
https://issues.apache.org/jira/browse/NIFI-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032400#comment-18032400
]
ASF subversion and git services commented on NIFI-15062:
--------------------------------------------------------
Commit 3a96a26591e760f8ce6947ed7999ea3f6a5f7245 in nifi's branch
refs/heads/main from David Handermann
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3a96a26591 ]
NIFI-15062 Added PutIcebergRecord Processor and Services
- Added RESTIcebergCatalog implementation of IcebergCatalog
- Added S3FileIOProvider implementation of IcebergFileIOProvider
- Added ParquetIcebergWriter implementation of IcebergWriter
Signed-off-by: Pierre Villard <[email protected]>
This closes #10400.
> Add PutIcebergRecord Processor and Controller Services
> ------------------------------------------------------
>
> Key: NIFI-15062
> URL: https://issues.apache.org/jira/browse/NIFI-15062
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: David Handermann
> Assignee: David Handermann
> Priority: Major
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> A new PutIcebergRecord Processor should be added that supports a core set of
> Apache Iceberg features and aligns with the extensibility of the Apache
> Iceberg ecosystem.
> Earlier support for writing Iceberg Records had tight coupling to Apache
> Hadoop and Apache Hive Catalogs, along with associated dependencies. The
> PutIcebergRecord Processor should build on the foundation of the iceberg-api
> library without direct linking to particular Catalog, FileIO, or
> serialization formats.
> The Java libraries for Apache Iceberg support decoupled and extensible
> structure, with Catalog, FileIO, and Record abstractions in the iceberg-api.
> Various implementations of these interfaces require specific libraries, which
> should be packaged in separate NAR bundles for isolated class loading.
> Supporting all possible Apache Iceberg integrations should not be the goal of
> the Apache NiFi project, but Controller Service interfaces and bundling
> should allow third party implementations for various use cases.
> Base modules supporting Apache Iceberg should include the following:
> * nifi-iceberg-shared-nar
> * nifi-iceberg-service-api
> * nifi-iceberg-service-api-nar
> * nifi-iceberg-processors
> * nifi-iceberg-processors-nar
> The shared-nar bundle should include the iceberg-api and iceberg-core
> libraries, along with associated dependencies. This provides general
> compatibility and avoids unnecessary duplication of common libraries. The
> service-api-nar should depend on the shared-nar for proper NAR hierarchical
> loading.
> The processors-nar should depend on the service-api-nar, avoiding any
> dependency on particular Iceberg implementation classes, which will be
> provided through Controller Service implementations.
> For initial support, Controller Service interfaces should provide
> abstractions for the following:
> * Iceberg Catalog
> * Iceberg Writer
> * Iceberg FileIO Provider
> Initial Controller Service implementations should include the following:
> * REST Iceberg Catalog
> * Parquet Iceberg Writer
> * S3 Iceberg FileIO Provider
> Supporting REST Iceberg Catalogs should cover a variety of use cases.
> Supporting Apache Parquet as the initial File Format addresses the most
> common use case, and support for other formats could be considered separately.
> With many examples, S3 is a common FileIO storage solution. Other FileIO
> implementations should be considered to cover major service providers, most
> focusing on S3 as the initial implementation narrows the scope of the initial
> implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)