When using parquet-tools on a parquet file with null records the null columns
are omitted from the output.
Example:
```
scala> case class Foo(a: Int, b: String)
defined class Foo
scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0
to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
```
Pre-patch:
```
☁ parquet-tools [master] ⚡ java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar
cat -j
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
| head -n5
{"a":1}
{"a":1}
{"a":1}
{"a":1}
{"a":1}
```
Post-patch:
```
☁ parquet-tools [master] ⚡ java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar
cat -j
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet
| head -n5
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
```
[ Full content available at: https://github.com/apache/parquet-mr/pull/518 ]
This message was relayed via gitbox.apache.org for [email protected]