Improve SerDe performance by using Text instead of String
---------------------------------------------------------
Key: HIVE-266
URL: https://issues.apache.org/jira/browse/HIVE-266
Project: Hadoop Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 0.2.0
Reporter: Zheng Shao
Priority: Critical
A recent performance study showed that 2 places in Hive code has exhibited
large cpu usage percentage:
1. String.getBytes() (UTF-8 encoding)
2. String.split()
We should replace String with Text object to:
1. Avoid UTF-8 decoding and encoding
2. Reuse the Text object and avoid creating new objects for each column in each
row like in String.split()
This is expected to give a big (20%+) performance improvement to Hive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.