Sushanth Sowmyan created HIVE-11344:
---------------------------------------

             Summary: HIVE-9845 makes HCatSplit.write modify the split so that 
PartitionInfo objects are unusable after it
                 Key: HIVE-11344
                 URL: https://issues.apache.org/jira/browse/HIVE-11344
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.0
            Reporter: Sushanth Sowmyan
            Assignee: Sushanth Sowmyan


HIVE-9845 introduced a notion of compression for HCatSplits so that when 
serializing, it finds commonalities between PartInfo and TableInfo objects, and 
if the two are identical, it nulls out that field in PartInfo, thus making sure 
that when PartInfo is then serialized, info is not repeated.

This, however, has the side effect of making the PartInfo object unusable if 
HCatSplit.write has been called.

While this does not affect M/R directly, since they do not know about the 
PartInfo objects and once serialized, the HCatSplit object is recreated by 
deserializing on the backend, which does restore the split and its PartInfo 
objects, this does, however, affect framework users of HCat that try to mimic 
M/R and then use the PartInfo objects to instantiate distinct readers.

Thus, we need to make it so that PartInfo is still usable after HCatSplit.write 
is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to