[ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639272#comment-14639272
 ] 

Mithun Radhakrishnan commented on HIVE-11344:
---------------------------------------------

Ah, that's a good point. I didn't realize that {{HCatSplit}} or {{PartInfo}} 
might be serialized in situations other than M/R / Tez serialization of splits.

At the time I wrote this, I did intend to check {{partitionSchema}}, 
{{inputFormatClassName}}, etc. for null, in their respective getters, and 
return the values from {{this.tableInfo}}. One "optimization" too far.

+1 to Solution (a).

> HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are 
> unusable after it
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11344
>                 URL: https://issues.apache.org/jira/browse/HIVE-11344
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when 
> serializing, it finds commonalities between PartInfo and TableInfo objects, 
> and if the two are identical, it nulls out that field in PartInfo, thus 
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if 
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the 
> PartInfo objects and once serialized, the HCatSplit object is recreated by 
> deserializing on the backend, which does restore the split and its PartInfo 
> objects, this does, however, affect framework users of HCat that try to mimic 
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after 
> HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to