Thanks a lot for working on this. Instead of grabbing some write properties like `block size` and `compression` and adding command line options for them I would suggest allowing to set any of the [parquet properties](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L132) in a future proof way. Some ideas: - Have a command line parameter that argument is a key-value pair; the parameter can be used multiple times - Have a command line parameter that argument is a list of key-value pairs - Have a command line parameter that argument is a file that contains the key-value pairs
Multiple solutions might also make sense (e.g. set the key-value pairs from command line as well as from file). The help shall reference the some docs or the source code for the up-to-date list of available options. It shall also list some of the most important options like `parquet.block.size`, `parquet.compression`, `parquet.page.size` or `parquet.writer.max-padding`. [ Full content available at: https://github.com/apache/parquet-mr/pull/512 ] This message was relayed via gitbox.apache.org for [email protected]
