[ https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099343#comment-14099343 ]
David Chen commented on HIVE-4329: ---------------------------------- Hi Sushanth, Thank you for taking a look at this ticket. I agree that it would be ideal to get Hive to a point where a unified StorageHandler interface can replace the current use of HiveOutputFormat and FileSinkOperator.RecordWriter (which should really be named HiveRecordWriter). However, that is a larger, more long-term undertaking whereas this ticket is to fix the fact that it is currently not possible to write using HCatalog for storage formats whose (Hive)OutputFormats that only implement getHiveRecordWriter and not getRecordWriter. The new tests I added as part of HIVE-7286 have demonstrated that only solving the type compatibility issue mentioned earlier in this ticket is not sufficient. The type error for AvroContainerOutputFormat masks the real issue which is that AvroContainerOutputFormat's getRecordWriter (as with ParquetHiveOutputFormat's) does nothing but throws an exception, which says that "this method should not be called." This is why my fix for this issue is taking this approach, which is based on the approach taken by core Hive. To my understanding, Hive accepts both MR OutputFormats as well as HiveOutputFormats but ends up calling getHiveRecordWriter in both cases. For the case of MR OutputFormats, Hive detects that it is not a HiveOutputFormat and wraps it using HivePassThroughOutputFormat. My understanding is that your main concern is that this patch may be turning HCatOutputFormat into a HiveOutputFormat. However, this is not the case. This patch does not change the HCatalog interface; it changes the way that HCatOutputFormat wraps the underlying OutputFormat so that it can properly handle HiveOutputFormats, which is required to make it possible to write using HCatalog for Avro and Parquet. > HCatalog should use getHiveRecordWriter rather than getRecordWriter > ------------------------------------------------------------------- > > Key: HIVE-4329 > URL: https://issues.apache.org/jira/browse/HIVE-4329 > Project: Hive > Issue Type: Bug > Components: HCatalog, Serializers/Deserializers > Affects Versions: 0.14.0 > Environment: discovered in Pig, but it looks like the root cause > impacts all non-Hive users > Reporter: Sean Busbey > Assignee: David Chen > Attachments: HIVE-4329.0.patch > > > Attempting to write to a HCatalog defined table backed by the AvroSerde fails > with the following stacktrace: > {code} > java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be > cast to org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84) > at > org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253) > at > org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53) > at > org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242) > at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > {code} > The proximal cause of this failure is that the AvroContainerOutputFormat's > signature mandates a LongWritable key and HCat's FileRecordWriterContainer > forces a NullWritable. I'm not sure of a general fix, other than redefining > HiveOutputFormat to mandate a WritableComparable. > It looks like accepting WritableComparable is what's done in the other Hive > OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also > be changed, since it's ignoring the key. That way fixing things so > FileRecordWriterContainer can always use NullWritable could get spun into a > different issue? > The underlying cause for failure to write to AvroSerde tables is that > AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so > fixing the above will just push the failure into the placeholder RecordWriter. -- This message was sent by Atlassian JIRA (v6.2#6252)