manojkarthick opened a new pull request #9306:
URL: https://github.com/apache/arrow/pull/9306


   Add an option to print output in JSON format. in the parquet-read binary. 
Having json output allows for easy analysis using tools like 
[jq](https://stedolan.github.io/jq/). This PR builds on the changes implemented 
in https://github.com/apache/arrow/pull/8686 and incorporates the suggestions 
in that PR.
   
   **Changelog**
   
   * Update all three binaries `parquet-schema`, `parquet-rowcount` and 
`parquet-read` to use [clap](https://github.com/clap-rs/clap) for argument 
parsing
   * Add `to_json_value()` method to get `serde_json::Value` from `Row` and 
`Field` structs (Thanks to @jhorstmann for these changes!)
   * parquet-schema:
      * Convert verbose argument into `-v/--verbose` flag
    * parquet-read:
      * Add a new flag `-j/--json` that prints the file contents in json lines 
format
      * The feature is gated under the `json_output` cargo feature
   * Update documentation and README with instructions for running
   * The binaries now use version and author information as defined in 
Cargo.toml
   
   Example output:
   
   ```
   ❯ parquet-read cities.parquet 3 --json
   
{"continent":"Europe","country":{"name":"France","city":["Paris","Nice","Marseilles","Cannes"]}}
   
{"continent":"Europe","country":{"name":"Greece","city":["Athens","Piraeus","Hania","Heraklion","Rethymnon","Fira"]}}
   {"continent":"North 
America","country":{"name":"Canada","city":["Toronto","Vancouver","St. 
John's","Saint 
John","Montreal","Halifax","Winnipeg","Calgary","Saskatoon","Ottawa","Yellowknife"]}}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to