[jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

Sushanth Sowmyan (JIRA) Fri, 15 Aug 2014 17:08:44 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099374#comment-14099374
 ]


Sushanth Sowmyan commented on HIVE-4329:
----------------------------------------

Hi David,

Your patch uses HiveFileFormatUtils.getOutputFormatSubstitute to determine the 
underlying HiveOutputFormat substitute for the underlying OutputFormat, which 
is the route taken by core hive to take both MR OutputFormats and 
HiveOutputFormats. Unfortunately, this will not work, because that simply 
fetches a substitute HiveOutputFormat from a map of substitutes, which contain 
substitutes for only IgnoreKeyTextOutputFormat and SequenceFileOutputFormat. 

Although Hive's interface seems to allow any OF, it in reality accepts only 
these 2 apart from those that are specifically HiveOutputFormats. Thus, your 
call to that function will simply return null for mapreduce OutputFormats that 
are not HiveOutputFormats and are not the above two formats, and effectively 
does break runtime backward compatibility, even if not breaking compiletime 
backward compatibility.

If your patch were so that it fetches an underlying HiveOutputFormat, and if it 
were a HiveOutputFormat, using getHiveRecordWriter, and if it were not, using 
getRecordWriter, that solution would not break runtime backward compatibility, 
and would be acceptable - I tried something on that line over at 
https://issues.apache.org/jira/browse/HIVE-4524 (which wasn't eventually 
committed, because that problem was solved from the HBaseStorageHandler end 
rather than solving it from HCat's end) if you'd like to look at that. I think 
that might be a better way of solving the base issue of HiveOutputFormats not 
working from within HCatalog.


> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> -------------------------------------------------------------------
>
>                 Key: HIVE-4329
>                 URL: https://issues.apache.org/jira/browse/HIVE-4329
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Serializers/Deserializers
>    Affects Versions: 0.14.0
>         Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>            Reporter: Sean Busbey
>            Assignee: David Chen
>         Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>       at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>       at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>       at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>       at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>       at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

Reply via email to