[
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637699#comment-14637699
]
Sushanth Sowmyan commented on HIVE-11344:
-----------------------------------------
[~mithun], could you please review?
> HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo
> objects are unusable after it
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-11344
> URL: https://issues.apache.org/jira/browse/HIVE-11344
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when
> serializing, it finds commonalities between PartInfo and TableInfo objects,
> and if the two are identical, it nulls out that field in PartInfo, thus
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the
> PartInfo objects and once serialized, the HCatSplit object is recreated by
> deserializing on the backend, which does restore the split and its PartInfo
> objects, this does, however, affect framework users of HCat that try to mimic
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after
> HCatSplit.write is called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)