[
https://issues.apache.org/jira/browse/APEXMALHAR-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206130#comment-15206130
]
ASF GitHub Bot commented on APEXMALHAR-2014:
--------------------------------------------
GitHub user shubham-pathak22 opened a pull request:
https://github.com/apache/incubator-apex-malhar/pull/219
APEXMALHAR-2014 added parquet reader
Adding **ParquetReaderOperator** which will allow apex users to read
parquet files.
**Apache Parquet** is a columnar storage format available to any project in
the Hadoop ecosystem, regardless of the choice of data processing framework,
data model or programming language.
For more information : [Apache Parquet]
(https://parquet.apache.org/documentation/latest/ "Apache Parquet")
#### Implementation Details
* **AbstractParquetFileReaderOperator** extends from
**AbstractFileInputOperator**. Overrides *openFile()* and *readEntity()*
methods.
* *openFile()* method instantiates a *ParquetReader* ( reader provided by
parquet-mr project that reads parquet records from a file ) with
*GroupReadSupport* ( records would be read as *Group* ) .
* *readEntity()* method reads the records and calls *convertGroup()*
method. Derived classes to override *convertGroup()* method to convert
*Group* to any form required by downstream operators.
* Provided **ParquetFilePOJOReader** operator which is a concrete
implementation of **AbstractParquetFileReader** to read Parquet files and emits
records as POJOs. The
POJO class name & field mapping should be provided by the user. If this
mapping is not provided then reflection is used to determine this
mapping. As
of now only primitive types ( INT32, INT64, BOOLEAN, FLOAT, DOUBLE,
BINARY )
are supported.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shubham-pathak22/incubator-apex-malhar
APEXMALHAR-2014
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-apex-malhar/pull/219.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #219
----
commit d5c9415959f583caf68292cfb3cfac262d49eb99
Author: shubham <[email protected]>
Date: 2016-03-21T10:20:21Z
APEXMALHAR-2014 added parquet reader
----
> ParquetReader operator
> ----------------------
>
> Key: APEXMALHAR-2014
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2014
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: shubham pathak
> Assignee: shubham pathak
>
> Developing a ParquetReaderOperator which would allow apex users to read
> records from parquet files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)