[
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386360#comment-17386360
]
ASF GitHub Bot commented on PARQUET-968:
----------------------------------------
ccpstephanie commented on pull request #411:
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-885759396
Although it's closed but I'm a bit confused... why I always get the old
schema version? `parquet.proto.writeSpecsCompliant=false`, and directly using
ParquetWriter. I'm using the latest version currently 1.12.0.
I'd highly appreciate if someone could point out something stupid in my
code! Or it's the same issue you are experiencing?
My goal is to be able query data via Athena/Presto, or Hive Metastore, so
need the the new parquet schema version.
**Method 1:**
// Doesn't work!
Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to
true, the old schema style will be used (without wrappers).
ParquetWriter<MessageOrBuilder> writer =
ProtoParquetWriter.<MessageOrBuilder>builder(file).withMessage(cls).withConf(conf).build();
for (MessageOrBuilder record : records) {
writer.write(record);
}
writer.close();
System.err.println(writer.getFooter());
**Method 2:**
// Doesn't work!
Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, false); // If set to
true, the old schema style will be used (without wrappers).
try (ParquetWriter writer = new ParquetWriter(
file,
new
ProtoWriteSupport<AddressBook>(AddressBook.class),
CompressionCodecName.GZIP,
128 * 1024 *
1024,//PARQUET_BLOCK_SIZE,
ParquetProperties.DEFAULT_PAGE_SIZE,
ParquetProperties.DEFAULT_PAGE_SIZE,
true,
false,
ParquetProperties.DEFAULT_WRITER_VERSION,
conf)) {
for (Object record : messages) {
writer.write(record);
}
writer.close();
System.err.println(writer.getFooter());
**Parquet output Metadata:**
`
_ParquetMetaData{FileMetaData{schema: message AddressBookProtos.AddressBook {
repeated group people = 1 {
optional binary name (STRING) = 1;
optional int32 id = 2;
optional binary email (STRING) = 3;
repeated group phones = 4 {
optional binary number (STRING) = 1;
optional binary type (ENUM) = 2;
} }}
, metadata: {parquet.proto.descriptor=name: "AddressBook"
field {
name: "people"
number: 1
label: LABEL_REPEATED
type: TYPE_MESSAGE
type_name: ".AddressBookProtos.Person"}
, parquet.proto.writeSpecsCompliant=false,
...}
`
**Protobuf Messasge:**
`
syntax = "proto3";
package AddressBookProtos;
option java_multiple_files = true;
option java_package = "com.mycompany.app";
option java_outer_classname = "AddressBookProtos";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
}
message AddressBook {
repeated Person people = 1;
}
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Hive/Presto support in ProtoParquet
> ---------------------------------------
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
> Issue Type: Task
> Reporter: Constantin Muraru
> Assignee: Constantin Muraru
> Priority: Major
> Fix For: 1.11.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)