[
https://issues.apache.org/jira/browse/HIVE-21328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Mollitor updated HIVE-21328:
----------------------------------
Attachment: HIVE-21328.1.patch
> Call To Hadoop Text getBytes() Without Call to getLength()
> ----------------------------------------------------------
>
> Key: HIVE-21328
> URL: https://issues.apache.org/jira/browse/HIVE-21328
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 4.0.0, 3.2.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Major
> Attachments: HIVE-21328.1.patch
>
>
> I'm not sure if there is actually a bug, but this looks highly suspect:
> {code:java}
> public Object set(final Object o, final Text text) {
> return new BytesWritable(text == null ? null : text.getBytes());
> }
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106
> There are two components to a Text object. There are the internal bytes and
> the length of the bytes. The two are independent. I.e., a quick "reset" on
> the Text object simply sets the internal length counter to zero. This code
> is potentially looking at obsolete data that it shouldn't be seeing because
> it is not considering the length of the Text.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)