[
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639272#comment-14639272
]
Mithun Radhakrishnan commented on HIVE-11344:
---------------------------------------------
Ah, that's a good point. I didn't realize that {{HCatSplit}} or {{PartInfo}}
might be serialized in situations other than M/R / Tez serialization of splits.
At the time I wrote this, I did intend to check {{partitionSchema}},
{{inputFormatClassName}}, etc. for null, in their respective getters, and
return the values from {{this.tableInfo}}. One "optimization" too far.
+1 to Solution (a).
> HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are
> unusable after it
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-11344
> URL: https://issues.apache.org/jira/browse/HIVE-11344
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when
> serializing, it finds commonalities between PartInfo and TableInfo objects,
> and if the two are identical, it nulls out that field in PartInfo, thus
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the
> PartInfo objects and once serialized, the HCatSplit object is recreated by
> deserializing on the backend, which does restore the split and its PartInfo
> objects, this does, however, affect framework users of HCat that try to mimic
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after
> HCatSplit.write is called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)