Alex Levenson created PARQUET-65:
------------------------------------
Summary: Create a jackson integration module for pojo support
Key: PARQUET-65
URL: https://issues.apache.org/jira/browse/PARQUET-65
Project: Parquet
Issue Type: New Feature
Components: parquet-mr
Reporter: Alex Levenson
Priority: Minor
There's currently a PR for pojo support:
https://github.com/apache/incubator-parquet-mr/pull/21
And it occurred to me that one way we could do this without re-inventing the
wheel is to use jackson. Jackson can essentially take a parse tree, either the
result of parsing XML, or json, or anything (for example there's a yaml
plugin), and then, there are 3 things jackson lets you do with that tree. You
can either visit the nodes in the tree (they call this streaming), you can map
the tree onto the datastructures built into java (essentially get a Map<Object,
Object>, or, you can map the tree onto a user defined class. The latter lets
you work with a well typed class, and also lets you use jackson's annotations
for controlling how the tree -> pojo mapping works (renaming fields and so on).
We could leverage all of that by creating something that goes from parquet data
to the jackson parse tree, and then leave the rest of the work to jackson.
--
This message was sent by Atlassian JIRA
(v6.2#6252)