Improve SerDe performance by using Text instead of String ---------------------------------------------------------
Key: HIVE-266 URL: https://issues.apache.org/jira/browse/HIVE-266 Project: Hadoop Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.2.0 Reporter: Zheng Shao Priority: Critical A recent performance study showed that 2 places in Hive code has exhibited large cpu usage percentage: 1. String.getBytes() (UTF-8 encoding) 2. String.split() We should replace String with Text object to: 1. Avoid UTF-8 decoding and encoding 2. Reuse the Text object and avoid creating new objects for each column in each row like in String.split() This is expected to give a big (20%+) performance improvement to Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.