When using parquet-tools on a parquet file with null records the null columns 
are omitted from the output.

Example:
```
scala> case class Foo(a: Int, b: String)
defined class Foo

scala> org.apache.spark.sql.SparkSession.builder.getOrCreate.createDataset((0 
to 1000).map(x => Foo(1,null))).write.parquet("/tmp/foobar/")
```
Pre-patch:
```
☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar 
cat -j 
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet 
| head -n5
{"a":1}
{"a":1}
{"a":1}
{"a":1}
{"a":1}
```
Post-patch:
```
☁  parquet-tools [master] ⚡  java -jar target/parquet-tools-1.10.1-SNAPSHOT.jar 
cat -j 
/tmp/foobar/part-00000-436a4d37-d82a-4771-8e7e-e4d428464675-c000.snappy.parquet 
| head -n5
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
{"a":1,"b":null}
 ```

[ Full content available at: https://github.com/apache/parquet-mr/pull/518 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to