Met a schema problem when using AvroParquetInputFormat

Wei Yan Tue, 05 May 2015 09:33:12 -0700

Hi,

Have met a problem for using AvroParquetInputFromat for my MapReduce job.
The input files are written using two different version schemas. One field
in v1 is "int", while in v2 is "long". The Exception:


Exception in thread "main"
parquet.schema.IncompatibleSchemaModificationException: can not merge type
optional int32 a into optional int64 a
at parquet.schema.PrimitiveType.union(PrimitiveType.java:513)
at parquet.schema.GroupType.mergeFields(GroupType.java:359)
at parquet.schema.GroupType.union(GroupType.java:341)
at parquet.schema.GroupType.mergeFields(GroupType.java:359)
at parquet.schema.MessageType.union(MessageType.java:138)
at parquet.hadoop.ParquetFileWriter.mergeInto(ParquetFileWriter.java:497)
at parquet.hadoop.ParquetFileWriter.mergeInto(ParquetFileWriter.java:470)
at
parquet.hadoop.ParquetFileWriter.getGlobalMetaData(ParquetFileWriter.java:446)
at parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:429)
at parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:412)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:589)

I'm using Parquet 1.5, and it looks "int" cannot be merged with "long". I
tried 1.6.rc1, and set the "parquet.strict.typing", but still cannot help.

So I want to ask is there anyway to solve this problem, like automatically
convert "int" to "long"? instead of re-writing all data using the same
version.

thanks,
Wei

Met a schema problem when using AvroParquetInputFormat

Reply via email to