[ 
https://issues.apache.org/jira/browse/HIVE-21328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21328:
----------------------------------
    Attachment: HIVE-21328.1.patch

> Call To Hadoop Text getBytes() Without Call to getLength()
> ----------------------------------------------------------
>
>                 Key: HIVE-21328
>                 URL: https://issues.apache.org/jira/browse/HIVE-21328
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HIVE-21328.1.patch
>
>
> I'm not sure if there is actually a bug, but this looks highly suspect:
> {code:java}
>   public Object set(final Object o, final Text text) {
>     return new BytesWritable(text == null ? null : text.getBytes());
>   }
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106
> There are two components to a Text object.  There are the internal bytes and 
> the length of the bytes.  The two are independent.  I.e., a quick "reset" on 
> the Text object simply sets the internal length counter to zero.  This code 
> is potentially looking at obsolete data that it shouldn't be seeing because 
> it is not considering the length of the Text.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to