[ 
https://issues.apache.org/jira/browse/PARQUET-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597939#comment-16597939
 ] 

ASF GitHub Bot commented on PARQUET-1408:
-----------------------------------------

rushton opened a new pull request #518: [PARQUET-1408] Make parquet-tools to 
display fields with missing values
URL: https://github.com/apache/parquet-mr/pull/518
 
 
   When using parquet-tools on a parquet file with null records the null 
columns are omitted from the output.
   
   Example:
   ```
   scala> case class Foo(a: Int, b: String)
   defined class Foo
   
   scala> 
org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to 
1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
   ```
   Pre-patch:
   ```
   ☁  parquet-tools [master] ⚡  java -jar 
target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j 
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet 
| head -n5
   {"a":1}
   {"a":1}
   {"a":1}
   {"a":1}
   {"a":1}
   ```
   Post-patch:
   ```
   ☁  parquet-tools [master] ⚡  java -jar 
target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j 
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet 
| head -n5
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
   {"a":1,"b":null}
    ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> parquet-tools SimpleRecord does display null columns
> ----------------------------------------------------
>
>                 Key: PARQUET-1408
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1408
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Nicholas Rushton
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.10.1
>
>
> When using parquet-tools on a parquet file with null records the null columns 
> are omitted from the output.
>  
> Example:
> {code:java}
> scala> case class Foo(a: Int, b: String)
> defined class Foo
> scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 
> to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/"){code}
> Actual:
> {code:java}
> ☁  parquet-tools [master] ⚡  java -jar 
> target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j 
> /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
>  | head -n5
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}{code}
> Expected:
> {code:java}
> ☁  parquet-tools [master] ⚡  java -jar 
> target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j 
> /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
>  | head -n5
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to