[jira] [Updated] (PARQUET-1679) Invalid SchemaException for UUID while using AvroParquetWriter

Felix Kizhakkel Jose (Jira) Wed, 16 Oct 2019 12:23:10 -0700


     [ 
https://issues.apache.org/jira/browse/PARQUET-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Felix Kizhakkel Jose updated PARQUET-1679:
------------------------------------------
    Description: 
Hi,

I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a 
schema with an empty group: optional group id {} while I include a UUID field 
on my POJO object. Without UUID everything worked fine. I have seen Parquet 
suports UUID as part of [#PR-71] on 2.4 release. 
 But I am getting InvalidSchemaException on UUID. Is there anything that I am 
missing or its a known issue?

*My setup details:*

*gradle dependency :*

dependencies

{ compile group: 'org.springframework.boot', name: 'spring-boot-starter' 
compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile 
group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271' 
compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1' 
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1' 
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1' 
compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' 
compile group: 'joda-time', name: 'joda-time' compile group: 
'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5' 
compile group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-joda', 
version: '2.6.5' }

*Model used:*

@Data
 public class Employee

{ private UUID id; private String name; private int age; private Address 
address; }



@Data
public class Address {
 private String streetName;
 private String city;
 private Zip zip;
}



@Data
public class Zip {
 private int zip;
 private int ext;
}

 

+*My Serializer Code:*+

public void serialize(List<D> inputDataToSerialize, CompressionCodecName 
compressionCodecName) throws IOException {

Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
 Class clazz = inputDataToSerialize.get(0).getClass();

try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
 .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable 
fields
 .withDataModel(ReflectData.get())
 .withConf(parquetConfiguration)
 .withCompressionCodec(compressionCodecName)
 .withWriteMode(OVERWRITE)
 .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
 .build()) {

for (D input : inputDataToSerialize)

{ writer.write(input); }

}
 }

_**Where generic Type D is Employee_

  was:
Hi,

I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a 
schema with an empty group: optional group id {} while I include a UUID field 
on my POJO object. Without UUID everything worked fine. I have seen Parquet 
suports UUID as part of [#PR-71] on 2.4 release. 
But I am getting InvalidSchemaException on UUID. Is there anything that I am 
missing or its a known issue?

*My setup details:*

*gradle dependency :*

dependencies {
 compile group: 'org.springframework.boot', name: 'spring-boot-starter'
 compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6'

 compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: 
'1.11.271'
 compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1'
 compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1'
 compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1'
 compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1'
 compile group: 'joda-time', name: 'joda-time'
 compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', 
version: '2.6.5'
 compile group: 'com.fasterxml.jackson.datatype', name: 
'jackson-datatype-joda', version: '2.6.5'
}

*Model used:*

@Data
public class Employee {
 private UUID id;
 private String name;
 private int age;
 private Address address;
}

+*My Serializer Code:*+



public void serialize(List<D> inputDataToSerialize, CompressionCodecName 
compressionCodecName) throws IOException {

 Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
 Class clazz = inputDataToSerialize.get(0).getClass();

 try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
 .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable 
fields
 .withDataModel(ReflectData.get())
 .withConf(parquetConfiguration)
 .withCompressionCodec(compressionCodecName)
 .withWriteMode(OVERWRITE)
 .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
 .build()) {

 for (D input : inputDataToSerialize) {
 writer.write(input);
 }
 }
}

_**Where generic Type D is Employee_


> Invalid SchemaException for UUID while using AvroParquetWriter
> --------------------------------------------------------------
>
>                 Key: PARQUET-1679
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1679
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.10.1
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> Hi,
> I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a 
> schema with an empty group: optional group id {} while I include a UUID field 
> on my POJO object. Without UUID everything worked fine. I have seen Parquet 
> suports UUID as part of [#PR-71] on 2.4 release. 
>  But I am getting InvalidSchemaException on UUID. Is there anything that I am 
> missing or its a known issue?
> *My setup details:*
> *gradle dependency :*
> dependencies
> { compile group: 'org.springframework.boot', name: 'spring-boot-starter' 
> compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile 
> group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271' 
> compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1' 
> compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1' 
> compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1' 
> compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' 
> compile group: 'joda-time', name: 'joda-time' compile group: 
> 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5' 
> compile group: 'com.fasterxml.jackson.datatype', name: 
> 'jackson-datatype-joda', version: '2.6.5' }
> *Model used:*
> @Data
>  public class Employee
> { private UUID id; private String name; private int age; private Address 
> address; }
> @Data
> public class Address {
>  private String streetName;
>  private String city;
>  private Zip zip;
> }
> @Data
> public class Zip {
>  private int zip;
>  private int ext;
> }
>  
> +*My Serializer Code:*+
> public void serialize(List<D> inputDataToSerialize, CompressionCodecName 
> compressionCodecName) throws IOException {
> Path path = new 
> Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
>  Class clazz = inputDataToSerialize.get(0).getClass();
> try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
>  .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate 
> nullable fields
>  .withDataModel(ReflectData.get())
>  .withConf(parquetConfiguration)
>  .withCompressionCodec(compressionCodecName)
>  .withWriteMode(OVERWRITE)
>  .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
>  .build()) {
> for (D input : inputDataToSerialize)
> { writer.write(input); }
> }
>  }
> _**Where generic Type D is Employee_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PARQUET-1679) Invalid SchemaException for UUID while using AvroParquetWriter

Reply via email to