[
https://issues.apache.org/jira/browse/PARQUET-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597939#comment-16597939
]
ASF GitHub Bot commented on PARQUET-1408:
-----------------------------------------
rushton opened a new pull request #518: [PARQUET-1408] Make parquet-tools to
display fields with missing values
URL: https://github.com/apache/parquet-mr/pull/518
When using parquet-tools on a parquet file with null records the null
columns are omitted from the output.
Example:
```
scala> case class Foo(a: Int, b: String)
defined class Foo
scala>
org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 to
1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
```
Pre-patch:
```
☁ parquet-tools [master] ⚡ java -jar
target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
| head -n5
{"a":1}
{"a":1}
{"a":1}
{"a":1}
{"a":1}
```
Post-patch:
```
☁ parquet-tools [master] ⚡ java -jar
target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
| head -n5
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> parquet-tools SimpleRecord does display null columns
> ----------------------------------------------------
>
> Key: PARQUET-1408
> URL: https://issues.apache.org/jira/browse/PARQUET-1408
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.9.0
> Reporter: Nicholas Rushton
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.10.1
>
>
> When using parquet-tools on a parquet file with null records the null columns
> are omitted from the output.
>
> Example:
> {code:java}
> scala> case class Foo(a: Int, b: String)
> defined class Foo
> scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0
> to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/"){code}
> Actual:
> {code:java}
> ☁ parquet-tools [master] ⚡ java -jar
> target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j
> /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
> | head -n5
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}
> {"a":1}{code}
> Expected:
> {code:java}
> ☁ parquet-tools [master] ⚡ java -jar
> target/parquet-tools-1.10.1-SNAPSHOT.jar cat -j
> /tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
> | head -n5
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}
> {"a":1,"b":null}{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)