[
https://issues.apache.org/jira/browse/HBASE-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571871#action_12571871
]
stack commented on HBASE-76:
----------------------------
Pardon me. Didn't look for attachment.
Test contrasts String's native UTF-8ing with Text's and then construction of
either from bytes. Looks like the Text UTF8'ing ain't that much faster than
String's. The big difference deserializing is kinda odd -- String is doing
extra work?
Text and String though are different animals I suppose; the one is backed by
UTF-8 bytes while the other is backed by UTF-16BE.
> [hbase] performance: Try to purge servers of Text
> -------------------------------------------------
>
> Key: HBASE-76
> URL: https://issues.apache.org/jira/browse/HBASE-76
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: stack
> Priority: Minor
> Attachments: TextVsString.java
>
>
> Chatting with Jim while looking at profiler outputs, we should make an effort
> at purging the servers of the Text type so HRegionServer doesn't ever have to
> deal in Characters and the concomitant encode/decode to UTF-8. Toward this
> end, we'd make changes like moving HStoreKey to have four rather than 3 data
> members: column family, column family qualifier, row + timestamp done as a
> basic Writable -- ImmutableBytesWritable? -- and a long rather than a Text
> column, Text row and a timestamp long. This would save on our having to do
> the relatively expensive 'find' of the column family separator inside in
> extractFamily (>10% of CPU scanning). Chatting about it, we could effect the
> change without change in the public client API; clients could continue to
> take Text type for row and column and then client-side, the convertion to
> HStoreKey could be done before crossing the wire to the server.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.