[
https://issues.apache.org/jira/browse/HBASE-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677255#comment-16677255
]
Ben Lau commented on HBASE-21439:
---------------------------------
Hi [~stack] which mistake -- using different String conversions to get/put a
region in a map, or using Bytes.toString() for a byte array that may not be the
equivalent of some encoded UTF8 string?
For mistake #1, I’m not aware of any other similar bugs in the codebase though
it’s possible.
I think we make mistake #2 in other parts of the code base particularly for
printing debug messages for start/end keys of regions.
Depending on how exotic your rowkey-space is (how far it is from the UTF8
plane), you could run into an issue.
By 'issue,' I mean that parts of the start/end key will be silently dropped
during decoding and replaced with new characters to indicate malformed input.
It would be a bit misleading or strange but it would not crash.
I can create a Jira ticket to audit the Bytes.toString() calls (there are many)
but don’t have bandwidth to look at it unfortunately.
> StochasticLoadBalancer RegionLoads aren’t being used in RegionLoad cost
> functions
> ---------------------------------------------------------------------------------
>
> Key: HBASE-21439
> URL: https://issues.apache.org/jira/browse/HBASE-21439
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Affects Versions: 1.3.2.1, 2.0.2
> Reporter: Ben Lau
> Assignee: Ben Lau
> Priority: Major
>
> In StochasticLoadBalancer.updateRegionLoad() the region loads are being put
> into the map with Bytes.toString(regionName).
> First, this is a problem because Bytes.toString() assumes that the byte array
> is a UTF8 encoded String but there is no guarantee that regionName bytes are
> legal UTF8.
> Secondly, in BaseLoadBalancer.registerRegion, we are reading the region loads
> out of the load map not using Bytes.toString() but using
> region.getRegionNameAsString() and region.getEncodedName(). So the load
> balancer will not see or use any of the cluster's RegionLoad history.
> There are 2 primary ways to solve this issue, assuming we want to stay with
> String keys for the load map (seems reasonable to aid debugging). We can
> either fix updateRegionLoad to store the regionName as a string properly or
> we can update both the reader & writer to use a new common valid String
> representation.
> Will post a patch assuming we want to pursue the original intention, i.e.
> store regionNameAsAString for the loadmap key, but I'm open to fixing this a
> different way.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)