To follow up, I think the problem here was that we were merging two
Parquet schemas. We don't really have rules for merging schemas and we
don't really need them. 1.6.0 works because we resolve the expected
schema with each file schema individually.
This will still be a problem if you use client-side metadata instead of
task-side.
rb
On 05/06/2015 08:43 PM, Alex Levenson wrote:
Glad that worked!
On Wed, May 6, 2015 at 6:42 PM, Wei Yan <[email protected]> wrote:
Thanks, Alex.
The new version solves the issue.
-Wei
On Tue, May 5, 2015 at 8:20 PM, Alex Levenson <
[email protected]> wrote:
1.6.0rc1 is pretty old, have you tried with 1.6.0 ?
On Tue, May 5, 2015 at 9:31 AM, Wei Yan <[email protected]> wrote:
Hi,
Have met a problem for using AvroParquetInputFromat for my MapReduce
job.
The input files are written using two different version schemas. One
field
in v1 is "int", while in v2 is "long". The Exception:
Exception in thread "main"
parquet.schema.IncompatibleSchemaModificationException: can not merge
type
optional int32 a into optional int64 a
at parquet.schema.PrimitiveType.union(PrimitiveType.java:513)
at parquet.schema.GroupType.mergeFields(GroupType.java:359)
at parquet.schema.GroupType.union(GroupType.java:341)
at parquet.schema.GroupType.mergeFields(GroupType.java:359)
at parquet.schema.MessageType.union(MessageType.java:138)
at
parquet.hadoop.ParquetFileWriter.mergeInto(ParquetFileWriter.java:497)
at
parquet.hadoop.ParquetFileWriter.mergeInto(ParquetFileWriter.java:470)
at
parquet.hadoop.ParquetFileWriter.getGlobalMetaData(ParquetFileWriter.java:446)
at
parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:429)
at
parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:412)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:589)
I'm using Parquet 1.5, and it looks "int" cannot be merged with
"long". I
tried 1.6.rc1, and set the "parquet.strict.typing", but still cannot
help.
So I want to ask is there anyway to solve this problem, like
automatically
convert "int" to "long"? instead of re-writing all data using the same
version.
thanks,
Wei
--
Alex Levenson
@THISWILLWORK
--
Ryan Blue
Software Engineer
Cloudera, Inc.