Robert Liszli created NIFI-10556:
------------------------------------

             Summary: Create processor to support DeltaLake tables
                 Key: NIFI-10556
                 URL: https://issues.apache.org/jira/browse/NIFI-10556
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Extensions
            Reporter: Robert Liszli
            Assignee: Robert Liszli


*Plan for the new processor*

The new processor will use the Delta Standalone library to generate delta table 
for a set of parquet data files located locally or in cloud storage.

*Processors input:*
 * The path of the parquet files(a single directory). Located at local 
filesystem or in cloud storage(S3, GCP or Azure).
 * Structure of the parquet file in json format.

*Processors parameter:*
 * Dropdown selector for storage type selection.
 * Credentials for the selected storage type.

*On Trigger:*
 * The processor will compare the files in the data directory to the files 
already added to the delta table. If new data file exist, it will add it to the 
delta table.
 * If there is no delta table exists, the processor will create one and the 
delta table will be generated.

*Output of the processor:*
 * Up to date Delta table in the chosen storage system.

 

Delta Standalone: [https://github.com/delta-io/connectors#delta-standalone]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to