[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401822#comment-16401822
 ] 

ASF GitHub Bot commented on PARQUET-968:
----------------------------------------

BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-373654480
 
 
   @costimuraru I did a PR in your repo to add a flag that enable writing using 
the specs-compliant schemas: https://github.com/costimuraru/parquet-mr/pull/2 . 
The flag is off by default in order to keep backward compatibility. 
   I believe this is the safest way going forward and will allow to merge this 
PR without breaking backward compatibility. The flag could be set to true in a 
future major release of parquet.
   
   You can find a test suite at https://github.com/BenoitHanotte/parquet-968 
that highlights the differences in how the old schemas and the new ones 
(specs-compliant) are interpreted by Spark and validates the changes in this PR:
   - default behavior (with flag set to false, default behavior, backward 
compatible):
   ```
   +-------------+----------------+--------+----------------+
   |emptyRepeated|nonEmptyRepeated|emptyMap|     nonEmptyMap|
   +-------------+----------------+--------+----------------+
   |           []|          [1, 1]|      []|[[1, 1], [2, 2]]|
   +-------------+----------------+--------+----------------+
   ```
   - specs-compliant schemas:
   ```
   +-------------+----------------+--------+----------------+
   |emptyRepeated|nonEmptyRepeated|emptyMap|     nonEmptyMap|
   +-------------+----------------+--------+----------------+
   |         null|          [1, 1]|    null|[1 -> 1, 2 -> 2]|
   +-------------+----------------+--------+----------------+
   ```
   
   You can see that maps are interpreted differently (they are interpreted as 
lists of key-value tuples with the non-compliant schemas) as are the default 
values (empty lists and maps).
   
   @lukasnalezenec @qinghui-xu feel free to also have a look as we would like 
to have this PR merged if all the concerns are addressed 
   
   Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---------------------------------------
>
>                 Key: PARQUET-968
>                 URL: https://issues.apache.org/jira/browse/PARQUET-968
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Constantin Muraru
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to