[GitHub] [arrow] FawnD2 opened a new pull request #9649: ARROW-10514: [C++][Parquet] Make the column name the same for both output formats of parquet reader

GitBox Sat, 06 Mar 2021 14:03:57 -0800


FawnD2 opened a new pull request #9649:
URL: https://github.com/apache/arrow/pull/9649



   In parquet-reader there are two ways to output the schema for a Parquet 
file: DebugPrint and JSONPrint. When output in JSON format, the Column name is 
short name instead of full-qualified name. For example, for schema (1), there 
will be 2 Columns with `"Name": "key"`. That's very confusing.
   
   In this PR we start using full-qualified name for Column in JSONPrint 
instead of short name, similar to DebugPrint.
   
   (1):
   ```
   required group field_id=0 spark_schema {
     optional group field_id=1 a (Map) {
       repeated group field_id=2 key_value {
         required binary field_id=3 key (String);
         optional group field_id=4 value (Map) {
           repeated group field_id=5 key_value {
             required int32 field_id=6 key;
             required boolean field_id=7 value;
           }
         }
       }
     }
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] FawnD2 opened a new pull request #9649: ARROW-10514: [C++][Parquet] Make the column name the same for both output formats of parquet reader

Reply via email to