exceptionfactory opened a new pull request, #10400:
URL: https://github.com/apache/nifi/pull/10400

   # Summary
   
   [NIFI-15062](https://issues.apache.org/jira/browse/NIFI-15062) Adds a 
`PutIcebergRecord` Processor and several Controller Services that provide 
initial integration for storing records in Apache Iceberg tables.
   
   The Apache Iceberg ecosystem supports a wide variety of catalogs, storage 
providers, and file formats. The purpose of this pull request is to provide 
several Controller Service abstractions that enable extensible integration, 
with a specific implementation of each Controller Service. With the number of 
potential integration options, Apache NiFi should not necessarily implement 
support for every possible solution, but should provide extension points that 
enable focused types of integration. The `iceberg-api` library is the 
foundation for this approach.
   
   The `nifi-iceberg-bundle` includes multiple modules that have the following 
dependency hierarchy:
   
   - `nifi-iceberg-shared-nar`
     - `nifi-iceberg-services-api-nar`
       - `nifi-iceberg-processors-nar`
       - `nifi-iceberg-rest-catalog-nar`
       - `nifi-iceberg-aws-nar`
       - `nifi-iceberg-parquet-writer-nar`
   
   The `nifi-iceberg-shared-nar` contains the `iceberg-api` and `iceberg-core` 
libraries along with transitive dependencies.
   
   The `nifi-iceberg-services-api-nar` depends on `iceberg-api` and 
incorporates the Apache NiFi Controller Service interfaces that align with 
`iceberg-api` interfaces.
   
   The `nifi-iceberg-processors-nar` contains the `PutIcebergRecord` Processor, 
which references properties for the following Controller Services:
   
   - `IcebergCatalog`
   - `IcebergWriter`
   - `IcebergFileIOProvider`
   
   These three interfaces define the primary extension points for external 
integration.
   
   The `nifi-iceberg-rest-catalog-nar` contains the `RESTIcebergCatalog` 
implementation of the `IcebergCatalog` Controller Service. This implementation 
configures the `RESTSessionCatalog` from the `iceberg-core` library and 
supports Catalog Authentication using OAuth2 with Client Credentials or Bearer 
Tokens. Building on the `iceberg-core` library provided in 
`nifi-iceberg-shared-nar`, the `nifi-iceberg-rest-catalog-nar` does not have 
any additional dependencies. The `RESTIcebergCatalog` defines a `FileIO 
Provider` property that supports configurable Controller Services for Iceberg 
`FileIO` implementations.
   
   The `nifi-iceberg-aws-nar` contains the `S3IcebergFileIOProvider` which 
configures and returns the Iceberg `S3FileIO` class. Support for S3 requires a 
number of AWS SDK 2 libraries, which is one of the primary reasons for separate 
packaging of `FileIOProvider` implementations. The S3 implementation supports 
configurable authentication using Basic or Session Credentials, as well as 
Vended Credentials, where the REST Catalog is expected to provide the required 
credentials.
   
   The `nifi-iceberg-parquet-writer-nar` contains the `ParquetIcebergWriter` 
Controller Service, supporting Apache Parquet serialization. Apache Parquet has 
a number of transitive dependencies, including a dependency on the 
`hadoop-common` library. The NAR packaging excludes many unnecessary transitive 
dependencies and has an explicit list of dependencies required at runtime for 
Parquet serialization.
   
   This implementation structure and Controller Service design strategy should 
serve as the basis for additional storage provider implementations. With Apache 
Parquet being the predominant format for Apache Iceberg, direct support for 
other file formats may not be necessary. The variety of Iceberg REST Catalog 
implementations may require additional configuration options in the future, but 
the core `IcebergCatalog` Controller Service abstraction provides a decoupled 
strategy for future implementation.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [X] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [X] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-00000`
   - [X] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [X] Pull Request based on current revision of the `main` branch
   - [X] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [X] Build completed using `./mvnw clean install -P contrib-check`
     - [X] JDK 21
     - [ ] JDK 25
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to