[ https://issues.apache.org/jira/browse/HIVE-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-266: ---------------------------- Hadoop Flags: [Incompatible change] > Improve SerDe performance by using Text instead of String > --------------------------------------------------------- > > Key: HIVE-266 > URL: https://issues.apache.org/jira/browse/HIVE-266 > Project: Hadoop Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 0.2.0 > Reporter: Zheng Shao > Priority: Critical > > A recent performance study showed that 2 places in Hive code has exhibited > large cpu usage percentage: > 1. String.getBytes() (UTF-8 encoding) > 2. String.split() > We should replace String with Text object to: > 1. Avoid UTF-8 decoding and encoding > 2. Reuse the Text object and avoid creating new objects for each column in > each row like in String.split() > This is expected to give a big (20%+) performance improvement to Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.