[
https://issues.apache.org/jira/browse/PARQUET-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felix Kizhakkel Jose updated PARQUET-1679:
------------------------------------------
Description:
Hi,
I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a
schema with an empty group: optional group id {} while I include a UUID field
on my POJO object. Without UUID everything worked fine. I have seen Parquet
suports UUID as part of [#PR-71] on 2.4 release.
But I am getting InvalidSchemaException on UUID. Is there anything that I am
missing or its a known issue?
*My setup details:*
*gradle dependency :*
dependencies
{ compile group: 'org.springframework.boot', name: 'spring-boot-starter'
compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile
group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271'
compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1'
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1'
compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1'
compile group: 'joda-time', name: 'joda-time' compile group:
'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5'
compile group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-joda',
version: '2.6.5' }
*Model used:*
@Data
public class Employee
{ private UUID id; private String name; private int age; private Address
address; }
@Data
public class Address {
private String streetName;
private String city;
private Zip zip;
}
@Data
public class Zip {
private int zip;
private int ext;
}
+*My Serializer Code:*+
public void serialize(List<D> inputDataToSerialize, CompressionCodecName
compressionCodecName) throws IOException {
Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
Class clazz = inputDataToSerialize.get(0).getClass();
try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
.withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable
fields
.withDataModel(ReflectData.get())
.withConf(parquetConfiguration)
.withCompressionCodec(compressionCodecName)
.withWriteMode(OVERWRITE)
.withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
.build()) {
for (D input : inputDataToSerialize)
{ writer.write(input); }
}
}
_**Where generic Type D is Employee_
was:
Hi,
I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a
schema with an empty group: optional group id {} while I include a UUID field
on my POJO object. Without UUID everything worked fine. I have seen Parquet
suports UUID as part of [#PR-71] on 2.4 release.
But I am getting InvalidSchemaException on UUID. Is there anything that I am
missing or its a known issue?
*My setup details:*
*gradle dependency :*
dependencies {
compile group: 'org.springframework.boot', name: 'spring-boot-starter'
compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6'
compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version:
'1.11.271'
compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1'
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1'
compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1'
compile group: 'joda-time', name: 'joda-time'
compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind',
version: '2.6.5'
compile group: 'com.fasterxml.jackson.datatype', name:
'jackson-datatype-joda', version: '2.6.5'
}
*Model used:*
@Data
public class Employee {
private UUID id;
private String name;
private int age;
private Address address;
}
+*My Serializer Code:*+
public void serialize(List<D> inputDataToSerialize, CompressionCodecName
compressionCodecName) throws IOException {
Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
Class clazz = inputDataToSerialize.get(0).getClass();
try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
.withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable
fields
.withDataModel(ReflectData.get())
.withConf(parquetConfiguration)
.withCompressionCodec(compressionCodecName)
.withWriteMode(OVERWRITE)
.withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
.build()) {
for (D input : inputDataToSerialize) {
writer.write(input);
}
}
}
_**Where generic Type D is Employee_
> Invalid SchemaException for UUID while using AvroParquetWriter
> --------------------------------------------------------------
>
> Key: PARQUET-1679
> URL: https://issues.apache.org/jira/browse/PARQUET-1679
> Project: Parquet
> Issue Type: Bug
> Components: parquet-avro
> Affects Versions: 1.10.1
> Reporter: Felix Kizhakkel Jose
> Priority: Major
>
> Hi,
> I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a
> schema with an empty group: optional group id {} while I include a UUID field
> on my POJO object. Without UUID everything worked fine. I have seen Parquet
> suports UUID as part of [#PR-71] on 2.4 release.
> But I am getting InvalidSchemaException on UUID. Is there anything that I am
> missing or its a known issue?
> *My setup details:*
> *gradle dependency :*
> dependencies
> { compile group: 'org.springframework.boot', name: 'spring-boot-starter'
> compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile
> group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271'
> compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1'
> compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1'
> compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1'
> compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1'
> compile group: 'joda-time', name: 'joda-time' compile group:
> 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5'
> compile group: 'com.fasterxml.jackson.datatype', name:
> 'jackson-datatype-joda', version: '2.6.5' }
> *Model used:*
> @Data
> public class Employee
> { private UUID id; private String name; private int age; private Address
> address; }
> @Data
> public class Address {
> private String streetName;
> private String city;
> private Zip zip;
> }
> @Data
> public class Zip {
> private int zip;
> private int ext;
> }
>
> +*My Serializer Code:*+
> public void serialize(List<D> inputDataToSerialize, CompressionCodecName
> compressionCodecName) throws IOException {
> Path path = new
> Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
> Class clazz = inputDataToSerialize.get(0).getClass();
> try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
> .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate
> nullable fields
> .withDataModel(ReflectData.get())
> .withConf(parquetConfiguration)
> .withCompressionCodec(compressionCodecName)
> .withWriteMode(OVERWRITE)
> .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
> .build()) {
> for (D input : inputDataToSerialize)
> { writer.write(input); }
> }
> }
> _**Where generic Type D is Employee_
--
This message was sent by Atlassian Jira
(v8.3.4#803005)