[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

ASF GitHub Bot (Jira) Fri, 23 Jul 2021 09:35:04 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386360#comment-17386360
 ]


ASF GitHub Bot commented on PARQUET-968:
----------------------------------------

ccpstephanie commented on pull request #411:
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-885759396


   Although it's closed but I'm a bit confused... why I always get the old 
schema version?  `parquet.proto.writeSpecsCompliant=false`, and directly using 
ParquetWriter. I'm using the latest version currently 1.12.0.
   
   I'd highly appreciate if someone could point out something stupid in my 
code! Or it's the same issue you are experiencing? 
   
   My goal is to be able query data via Athena/Presto, or Hive Metastore, so 
need the the new parquet schema version.
   
   
   **Method 1:**
   
         // Doesn't work!
           Configuration conf = new Configuration();
           ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to 
true, the old schema style will be used (without wrappers).
   
           ParquetWriter<MessageOrBuilder> writer =
           
ProtoParquetWriter.<MessageOrBuilder>builder(file).withMessage(cls).withConf(conf).build();
   
           for (MessageOrBuilder record : records) {
               writer.write(record);
           }
   
           writer.close();
           System.err.println(writer.getFooter());
   
   **Method 2:**
   
         // Doesn't work!
           Configuration conf = new Configuration();
           ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to 
true, the old schema style will be used (without wrappers).
   
           try (ParquetWriter writer = new ParquetWriter(
                                                   file,
                                                   new 
ProtoWriteSupport<AddressBook>(AddressBook.class),
                                                   CompressionCodecName.GZIP,
                                                   128 * 1024 * 
1024,//PARQUET_BLOCK_SIZE,
                                                   
ParquetProperties.DEFAULT_PAGE_SIZE,
                                                   
ParquetProperties.DEFAULT_PAGE_SIZE, 
                                                   true,
                                                   false,
                                                   
ParquetProperties.DEFAULT_WRITER_VERSION,
                                                   conf)) {
               for (Object record : messages) {
                   writer.write(record);
               }
               writer.close();
               System.err.println(writer.getFooter());
   
   **Parquet output Metadata:**
   `
   _ParquetMetaData{FileMetaData{schema: message AddressBookProtos.AddressBook {
     repeated group people = 1 {
       optional binary name (STRING) = 1;
       optional int32 id = 2;
       optional binary email (STRING) = 3;
       repeated group phones = 4 {
         optional binary number (STRING) = 1;
         optional binary type (ENUM) = 2;
       }  }}
   , metadata: {parquet.proto.descriptor=name: "AddressBook"
   field {
     name: "people"
     number: 1
     label: LABEL_REPEATED
     type: TYPE_MESSAGE
     type_name: ".AddressBookProtos.Person"}
   , parquet.proto.writeSpecsCompliant=false,
   ...}
   `
   
   **Protobuf Messasge:**
   
   `
   syntax = "proto3";
   
   package AddressBookProtos;
   
   option java_multiple_files = true;
   option java_package = "com.mycompany.app";
   option java_outer_classname = "AddressBookProtos";
   
   message Person {
     string name = 1;
     int32 id = 2;
     string email = 3;
   
     enum PhoneType {
       MOBILE = 0;
       HOME = 1;
       WORK = 2;
     }
   
     message PhoneNumber {
       string number = 1;
       PhoneType type = 2;
     }
   
     repeated PhoneNumber phones = 4;
   }
   
   message AddressBook {
     repeated Person people = 1;
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add Hive/Presto support in ProtoParquet
> ---------------------------------------
>
>                 Key: PARQUET-968
>                 URL: https://issues.apache.org/jira/browse/PARQUET-968
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Constantin Muraru
>            Assignee: Constantin Muraru
>            Priority: Major
>             Fix For: 1.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

Reply via email to