[
https://issues.apache.org/jira/browse/HIVE-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Janaki Lahorani reassigned HIVE-16889:
--------------------------------------
Assignee: Janaki Lahorani
> Improve Performance Of VARCHAR
> ------------------------------
>
> Key: HIVE-16889
> URL: https://issues.apache.org/jira/browse/HIVE-16889
> Project: Hive
> Issue Type: Improvement
> Components: Types
> Affects Versions: 2.1.1, 3.0.0
> Reporter: BELUGA BEHR
> Assignee: Janaki Lahorani
> Priority: Major
>
> Often times, organizations use tools that create table schemas on the fly and
> they specify a VARCHAR column with precision. In these scenarios,
> performance suffers even though one could assume performance should be better
> since there is pre-existing knowledge about the size of the data and buffers
> could be more efficiently setup then in the case where no such knowledge
> exists.
> Most of the performance seems to be caused by reading a STRING from a file
> into a byte buffer, checking the length of the STRING, truncating the STRING
> if needed, and then serializing the STRING back into bytes again.
> From the code, I have identified several areas where develops left notes
> about later improvements.
> # org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
> # org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int,
> int)
> #
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object,
> PrimitiveObjectInspector)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)