[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639272#comment-14639272 ]
Mithun Radhakrishnan commented on HIVE-11344: --------------------------------------------- Ah, that's a good point. I didn't realize that {{HCatSplit}} or {{PartInfo}} might be serialized in situations other than M/R / Tez serialization of splits. At the time I wrote this, I did intend to check {{partitionSchema}}, {{inputFormatClassName}}, etc. for null, in their respective getters, and return the values from {{this.tableInfo}}. One "optimization" too far. +1 to Solution (a). > HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are > unusable after it > ----------------------------------------------------------------------------------------------- > > Key: HIVE-11344 > URL: https://issues.apache.org/jira/browse/HIVE-11344 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: HIVE-11344.patch > > > HIVE-9845 introduced a notion of compression for HCatSplits so that when > serializing, it finds commonalities between PartInfo and TableInfo objects, > and if the two are identical, it nulls out that field in PartInfo, thus > making sure that when PartInfo is then serialized, info is not repeated. > This, however, has the side effect of making the PartInfo object unusable if > HCatSplit.write has been called. > While this does not affect M/R directly, since they do not know about the > PartInfo objects and once serialized, the HCatSplit object is recreated by > deserializing on the backend, which does restore the split and its PartInfo > objects, this does, however, affect framework users of HCat that try to mimic > M/R and then use the PartInfo objects to instantiate distinct readers. > Thus, we need to make it so that PartInfo is still usable after > HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)