[jira] [Commented] (NIFI-3724) Add Put/Fetch Parquet Processors

ASF GitHub Bot (JIRA) Thu, 27 Apr 2017 14:34:23 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987734#comment-15987734
 ]


ASF GitHub Bot commented on NIFI-3724:
--------------------------------------

GitHub user bbende opened a pull request:

    https://github.com/apache/nifi/pull/1712

    NIFI-3724 - Add Put/Fetch Parquet Processors

    This PR adds a new nifi-parquet-bundle with PutParquet and FetchParquet 
processors. These work similar to PutHDFS and FetchHDFS, but instead read and 
write Records.
    
    While working on this I needed to reuse portions of the record 
reader/writer code, and thus refactored some of the project structure which 
caused many files to move around.
    
    Summary of changes:
    - Created nifi-parquet-bundle
    - Created nifi-commons/nifi-record to hold domain/API related to records
    - Created nifi-nar-bundles/nifi-extension-utils as a place for utility code 
specific to extensions
    - Moved nifi-commons/nifi-processor-utils under nifi-extension-utils
    - Moved nifi-commons/nifi-hadoop-utils under nifi-extension-utils
    - Create nifi-extension-utils/nifi-record-utils for utility code related 
records
    
    To test the Parquet processors you can create a core-site.xml with a local 
file system and read/write parquet to local directories:
    
    ```
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>file:///</value>
        </property>
    </configuration>
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bbende/nifi parquet-bundle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1712
    
----
commit a35e5957f5ff8c47df5352b7b1a5ef494fed8633
Author: Bryan Bende <[email protected]>
Date:   2017-04-12T22:25:31Z

    NIFI-3724 - Initial commit of Parquet bundle with PutParquet and 
FetchParquet
    - Creating nifi-records-utils to share utility code from record services
    - Refactoring Parquet tests to use MockRecorderParser and MockRecordWriter
    - Refactoring AbstractPutHDFSRecord to use schema access strategy
    - Adding custom validate to AbstractPutHDFSRecord and adding handling of 
UNION types when writing Records as Avro
    - Refactoring project structure to get CS API references out of 
nifi-commons, introducing nifi-extension-utils under nifi-nar-bundles

----


> Add Put/Fetch Parquet Processors
> --------------------------------
>
>                 Key: NIFI-3724
>                 URL: https://issues.apache.org/jira/browse/NIFI-3724
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Bryan Bende
>            Priority: Minor
>
> Now that we have the record reader/writer services currently in master, it 
> would be nice to have reader and writers for Parquet. Since Parquet's API is 
> based on the Hadoop Path object, and not InputStreams/OutputStreams, we can't 
> really implement direct conversions to and from Parquet in the middle of a 
> flow, but we can we can perform the conversion by taking any record format 
> and writing to a Path as Parquet, or reading Parquet from a Path and writing 
> it out as another record format.
> We should add a PutParquet that uses a record reader and writes records to a 
> Path as Parquet, and a FetchParquet that reads Parquet from a path and writes 
> out records to a flow file using a record writer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3724) Add Put/Fetch Parquet Processors

Reply via email to