[ 
https://issues.apache.org/jira/browse/NIFI-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032400#comment-18032400
 ] 

ASF subversion and git services commented on NIFI-15062:
--------------------------------------------------------

Commit 3a96a26591e760f8ce6947ed7999ea3f6a5f7245 in nifi's branch 
refs/heads/main from David Handermann
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3a96a26591 ]

NIFI-15062 Added PutIcebergRecord Processor and Services

- Added RESTIcebergCatalog implementation of IcebergCatalog
- Added S3FileIOProvider implementation of IcebergFileIOProvider
- Added ParquetIcebergWriter implementation of IcebergWriter

Signed-off-by: Pierre Villard <[email protected]>

This closes #10400.


> Add PutIcebergRecord Processor and Controller Services
> ------------------------------------------------------
>
>                 Key: NIFI-15062
>                 URL: https://issues.apache.org/jira/browse/NIFI-15062
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: David Handermann
>            Assignee: David Handermann
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A new PutIcebergRecord Processor should be added that supports a core set of 
> Apache Iceberg features and aligns with the extensibility of the Apache 
> Iceberg ecosystem.
> Earlier support for writing Iceberg Records had tight coupling to Apache 
> Hadoop and Apache Hive Catalogs, along with associated dependencies. The 
> PutIcebergRecord Processor should build on the foundation of the iceberg-api 
> library without direct linking to particular Catalog, FileIO, or 
> serialization formats.
> The Java libraries for Apache Iceberg support decoupled and extensible 
> structure, with Catalog, FileIO, and Record abstractions in the iceberg-api. 
> Various implementations of these interfaces require specific libraries, which 
> should be packaged in separate NAR bundles for isolated class loading.
> Supporting all possible Apache Iceberg integrations should not be the goal of 
> the Apache NiFi project, but Controller Service interfaces and bundling 
> should allow third party implementations for various use cases.
> Base modules supporting Apache Iceberg should include the following:
>  * nifi-iceberg-shared-nar
>  * nifi-iceberg-service-api
>  * nifi-iceberg-service-api-nar
>  * nifi-iceberg-processors
>  * nifi-iceberg-processors-nar
> The shared-nar bundle should include the iceberg-api and iceberg-core 
> libraries, along with associated dependencies. This provides general 
> compatibility and avoids unnecessary duplication of common libraries. The 
> service-api-nar should depend on the shared-nar for proper NAR hierarchical 
> loading.
> The processors-nar should depend on the service-api-nar, avoiding any 
> dependency on particular Iceberg implementation classes, which will be 
> provided through Controller Service implementations.
> For initial support, Controller Service interfaces should provide 
> abstractions for the following:
>  * Iceberg Catalog
>  * Iceberg Writer
>  * Iceberg FileIO Provider
> Initial Controller Service implementations should include the following:
>  * REST Iceberg Catalog
>  * Parquet Iceberg Writer
>  * S3 Iceberg FileIO Provider
> Supporting REST Iceberg Catalogs should cover a variety of use cases.
> Supporting Apache Parquet as the initial File Format addresses the most 
> common use case, and support for other formats could be considered separately.
> With many examples, S3 is a common FileIO storage solution. Other FileIO 
> implementations should be considered to cover major service providers, most 
> focusing on S3 as the initial implementation narrows the scope of the initial 
> implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to