[
https://issues.apache.org/jira/browse/NIFI-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Liszli updated NIFI-10556:
---------------------------------
Attachment: processor_usages.png
> Create processor to support DeltaLake tables
> --------------------------------------------
>
> Key: NIFI-10556
> URL: https://issues.apache.org/jira/browse/NIFI-10556
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Robert Liszli
> Assignee: Robert Liszli
> Priority: Major
> Attachments: processor_usages.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> *Plan for the new processor*
> The new processor will use the Delta Standalone library to generate the delta
> table for the parquet data files. This processor also capable to process
> other processors output file and upload it to the data store.
> *Processors input:*
> * The path of the parquet files(a single directory). Located at local
> filesystem or in cloud storage(S3, GCP or Azure).
> * Structure of the parquet file in json format.
> * If we want the processor to process other processors output file, then the
> attribute names of the output files path and filename should be set
> * Partition columns, separated by comma
> *Processors parameter:*
> * Dropdown selector for storage type selection.
> * Credentials for the selected storage type.
> *On Trigger:*
> * If we want the processor to process other processors output file, first it
> copies the new file to the desired data directory.
> * The processor will compare the files in the data directory to the files
> already added to the delta table. If new data file exist, it will add it to
> the delta table.
> * If there is no delta table exists, the processor will create one and the
> delta table will be generated.
> *Output of the processor:*
> * Up to date Delta table in the chosen storage system.
>
> Delta Standalone: [https://github.com/delta-io/connectors#delta-standalone]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)