[
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sushanth Sowmyan updated HIVE-11344:
------------------------------------
Summary: HIVE-9845 makes HCatSplit.write modify the split so that PartInfo
objects are unusable after it (was: HIVE-9845 makes HCatSplit.write modify the
split so that PartitionInfo objects are unusable after it)
> HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are
> unusable after it
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-11344
> URL: https://issues.apache.org/jira/browse/HIVE-11344
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when
> serializing, it finds commonalities between PartInfo and TableInfo objects,
> and if the two are identical, it nulls out that field in PartInfo, thus
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the
> PartInfo objects and once serialized, the HCatSplit object is recreated by
> deserializing on the backend, which does restore the split and its PartInfo
> objects, this does, however, affect framework users of HCat that try to mimic
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after
> HCatSplit.write is called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)