[ https://issues.apache.org/jira/browse/HIVE-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-266: ---------------------------- Attachment: HIVE-266.5.patch Incorporated Namit's feedback. Also changed GroupByOperator.getSize() to work with Writable. > Improve SerDe performance by using Text instead of String > --------------------------------------------------------- > > Key: HIVE-266 > URL: https://issues.apache.org/jira/browse/HIVE-266 > Project: Hadoop Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 0.4.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Priority: Critical > Fix For: 0.4.0 > > Attachments: HIVE-266.1.patch, HIVE-266.2.patch, HIVE-266.3.patch, > HIVE-266.4.patch, HIVE-266.5.patch > > > A recent performance study showed that 2 places in Hive code has exhibited > large cpu usage percentage: > 1. String.getBytes() (UTF-8 encoding) > 2. String.split() > We should replace String with Text object to: > 1. Avoid UTF-8 decoding and encoding > 2. Reuse the Text object and avoid creating new objects for each column in > each row like in String.split() > This is expected to give a big (20%+) performance improvement to Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.