[jira] Commented: (HBASE-82) row keys should be array of bytes with a specified comparator

stack (JIRA) Thu, 15 May 2008 14:35:22 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597277#action_12597277
 ]


stack commented on HBASE-82:
----------------------------

Thanks for the review.

On DELIMITER, regards HRI, we only parse the name in one place figuring the 
table name from region name.  Looking at how we compose region names, the parse 
is safe since it ranges over the table name portion which has been checked for 
legal characters.

For HSK, parses start at the buffers's zeroth byte.  Passed buffers will be 
full column name or family name.  Family names are also checked to make sure 
they have legal characters -- i.e. non-DELIMITER characters.

As best as I can tell, things are no worse than they were previously.

On legal table name, I fixed the javadoc (and made it more clear that this is 
for user-space tables, not catalog tables).

> HConnectionManager, RegExpRowFilter, HbaseMapWritable: why change from 
> HashMap to SortedMap? Is compareTo cheaper than computing hash? Not if map 
> key is the computed hash of the byte array.

On why SortedMaps rather than HashMaps, its because you can't use byte arrays 
as keys in HashMaps; or, rather, the hashcode for a byte array is that of the 
Object so two different byte arrays will not equate, even though they might 
have same content (but I think you know this, or at least, if you didn't, you 
do after reviewing this patch).  I thought that I'd gotten all places where I 
could get away with keeping HashMaps and using hash of byte array key as 
HashMap key but I missed the one in HConnectionManager.  I'll changed it back 
to HashMap with hash for of byte arrays for key (profiling yesterday, the 
lookup against this map of regions looking for cached regions on clientside 
takes 15% of all CPU doing writes -- most of time is in the SoftSortedMap -- 
this change should help a little).  RegExpRowFilter needs keys to be column 
names as does HbaseMapWritable (users of the latter want to get Map.Entry, etc. 
They'd be baffled if instead of the key they entered, instead they got an 
integer.

On Bytes.mapKey instead of Arrays.hashCode, the former does the boxing to 
Integer and uses the hashing mechanism that was in place before this patch -- 
the one Text used.  Any reason for using Arrays.hashcode instead?

On Maps converted to Sets instead, i got a few of these already.  If you 
noticed any I missed, I'd be interested.  I was certainly looking for 
opportunity (though, IIRC, Sets are often implemented atop Maps anyways so 
savings, if any, are minimal).

On changing regionServerStartup, I needed to pass Configuration; keys of 
WritableComparable.  Rather than do convertions on both sides, it was simplier 
to just use MapWritable for this one time operation.

There is another issue for managing any migrations associated with this issue.  
This patch is big enough as it is.  Migration work will be easier with it 
committed.

Thanks for the pointer to Byte.SIZE.  I changed Bytes.SIZEOF_LONG to do as you 
suggest.

May I commit?

> row keys should be array of bytes with a specified comparator
> -------------------------------------------------------------
>
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>            Assignee: stack
>             Fix For: 0.2.0
>
>         Attachments: 82-v12-ignore-ws.patch, 82-v13-ignore-ws.patch, 
> 82-v2.patch, 82-v3.patch, 82-v4.patch, 82-v5.patch, 82-v7.patch, 82-v8.patch, 
> 82-v9-ignore-ws.patch, 82.patch, Perf.java
>
>
> I have heard from several people that row keys in HBase should be less 
> restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead 
> to the most general case being either hadoop.io.BytesWritable or 
> hbase.io.ImmutableBytesWritable. The primary difference between these two 
> classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if 
> you do not pay attention to the length, (BytesWritable.getSize()), converting 
> a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as 
> you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for 
> ImmutableBytesWritable, because it has a fixed size once set, and operations 
> like get, etc do not have to something like System.arrayCopy where you 
> specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough 
> feedback that Text is too restrictive, we are willing to change it, but we 
> need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-82) row keys should be array of bytes with a specified comparator

Reply via email to