Thanks a lot for working on this.

Instead of grabbing some write properties like `block size` and `compression` 
and adding command line options for them I would suggest allowing to set any of 
the [parquet 
properties](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L132)
 in a future proof way.
Some ideas:
- Have a command line parameter that argument is a key-value pair; the 
parameter can be used multiple times
- Have a command line parameter that argument is a list of key-value pairs
- Have a command line parameter that argument is a file that contains the 
key-value pairs

Multiple solutions might also make sense (e.g. set the key-value pairs from 
command line as well as from file).
The help shall reference the some docs or the source code for the up-to-date 
list of available options. It shall also list some of the most important 
options like `parquet.block.size`, `parquet.compression`, `parquet.page.size` 
or `parquet.writer.max-padding`.

[ Full content available at: https://github.com/apache/parquet-mr/pull/512 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to