[ 
https://issues.apache.org/jira/browse/PARQUET-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816053#comment-16816053
 ] 

Victor commented on PARQUET-65:
-------------------------------

Is this still a subject?

It would be great to be able to generate a parquet schema from a pojo (in the 
same way as 
[https://github.com/FasterXML/jackson-dataformats-binary/tree/master/avro#generating-avro-schema-from-pojo-definition)]
 and then be able to write it to a parquet file, but without all the overhead 
of going through avro (which implies serializing to bytes then read it back 
with a generic record from avro before usirg the AvroParquetWriter, cf 
https://github.com/FasterXML/jackson-dataformats-binary/issues/9#issuecomment-325685012).

> Create a jackson integration module for pojo support
> ----------------------------------------------------
>
>                 Key: PARQUET-65
>                 URL: https://issues.apache.org/jira/browse/PARQUET-65
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>
> There's currently a PR for pojo support:
> https://github.com/apache/incubator-parquet-mr/pull/21
> And it occurred to me that one way we could do this without re-inventing the 
> wheel is to use jackson. Jackson can essentially take a parse tree, either 
> the result of parsing XML, or json, or anything (for example there's a yaml 
> plugin), and then, there are 3 things jackson lets you do with that tree. You 
> can either visit the nodes in the tree (they call this streaming), you can 
> map the tree onto the datastructures built into java (essentially get a 
> Map<Object, Object>, or, you can map the tree onto a user defined class. The 
> latter lets you work with a well typed class, and also lets you use jackson's 
> annotations for controlling how the tree -> pojo mapping works (renaming 
> fields and so on).
> We could leverage all of that by creating something that goes from parquet 
> data to the jackson parse tree, and then leave the rest of the work to 
> jackson. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to