Note that Avinash's ophf is 1-1 up to the max key length. But this has its own problems. (Mainly, that keys cannot be arbitrarily long, and hash computation gets more expensive proportional to max key length.)

-Jonathan

On Apr 4, 2009, at 10:32 PM, Zhu Han <[email protected]> wrote:

Avinash,

(For cassandra, the "key" mentioned in following words is the name of the row. The "token" mentioned in following words, is the output of the hash
function. )

For example, if you use 160bit md5 hash value as the space, it's finite theoretically, because you know the number of tokens which can be contained ahead of time, although the number is pretty large. For finite hash space,
the mapping from key to token is many to one instead of one to one.

For OHPF, a lot of keys could be mapped to the same token range because of the non-uniform distribution just as you pointed out. The load balancing can
solve the problem. However, if too may keys mapped to *the same token*
which is determined by the hash function, how can the load balance be
carried out? When the hash space is finite, it's possible that a large
number of keys are mapped to the same token. However, if the hash function is uniformly distributed, just as MD5 or SHA1, the probability of the worse
case is pretty rare, at least this case  cannot be produced by human
purposely, so that's not a problem. But for OPHF, if the clients pick up the name of key purposely, e.g. for DOS attack, the worst case can occur practically. When the attacker knows the algorithm of the OPHF, he can
produce arbitrary number of keys mapped to the same token.

If the hash space is infinite, or it's as large as the possible space of keys, this would not be a problem, because the key to token mapping is one
to one instead of many to one .

best regards,
hanzhu


On Sat, Apr 4, 2009 at 11:59 PM, Avinash Lakshman <
[email protected]> wrote:

Not sure what you mean by infinite. OPHF or not any hash function's range
is
practically infinite for MD5 is infinite hash range. With OPHF you will
have
a lot of skew. There are only two ways to deal with it:
(1) You know the space of of all your keys ahead of time and have your
system exploit that.
(2) Dynamically load balance until your system reaches steady state.

We are always sorting lexicographically and trying to do both of the above.

Avinash

On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <[email protected]> wrote:

Avinash,

I support the proposal of Jonathan.

Theoretically,  if the system wants to support range query over
consistent
hash(DHT), the hash space should be infinite. Otherwise, a large number
of
keys could be mapped to the same token and it's impossible to do load
balance by moving these keys around to other physical servers.
Especially,
the churn of OPHF could be very high because the user would like to
create
a
lot of keys with the same long prefix for some applications. If the
system
takes the string as the token and sorts them in *lexicographic order, the hash space would be infinite, so that the churn can be solved by load
balacing under any circumstances.

*best regards,
hanzhu


On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
[email protected]> wrote:

So how are you coming up with the tokens here? What do you mean by
string[0]
and string[last]? Are they the keys that belong to the system? If so
then
that is not a mechanism to reason about the range of your hash
function.
In
this scheme does it mean you will need to sort the tokens too using
collation scheme used by the external partitioner while identifying
which
key goes to which node. Also could you provide me with the patch
number.
I
need to go over that this weekend and make sure if it does not affect
the
(bootstrap/load balance) logic. My fear is that these changes have long
reaching effects and I need to make sure that all these pieces will
continue
to reside in harmony. Also your range query test did it run in a
distributed
environment or in a single box environment?
Avinash

On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <[email protected]>
wrote:

So with a String ring you just sort the strings (conceptually) and
you
wrap around from strings[last] to strings[0]. The range is entirely
defined by what tokens you have.

It's nice to be able to test against a real application. Mine won't
run without range queries...  (So it is still running against my
github code where I first wrote range queries, but that uses the OPHF
approach you have and so that is where I found the problems.)

-Jonathan

On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
<[email protected]> wrote:
I will look into this. Another point about P2P rings is something
about
the
ability to reason about the system. In any hash fn we hve a notion
of
range
and how the ring wraps around. For eg. in random hash such as MD5
we
say
0-2^128 -1 and then the wrap around. Similarly for the hash fn used
here
one
could define the same. Today we perform usual sort of strings and
the
hash
should respect that sort order. I guess you are saying the usual
sort
of
string does not respect collation. I will look into this and I
insist
this
should be very doable. Changes to the implementation of the core
classes
albeit how simple they are very very scary unless they are
completely
tested
too in a distributed setting. Over here we do not have detailed
test
code,
but we test by directing a % of the site traffic to a test cluster
before
we
sign off on anything.

Avinash

On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <[email protected]>
wrote:

Say for instance you have inserted keys ['a', 'b', '--a', '--b']
and
you are going to do range queries on them.  The correct
collation-aware sort is ['a', '--a', 'b', '--b']. But ordering by
char value gives ['--a', '--b', 'a', 'b'].

Attached is a test program illustrating this.  The assert will
fail
because the hash ordering is not the same as the collator's.

-Jonathan

On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
<[email protected]> wrote:
In fact what would you want is hash to respect oorder based on
how
the
strings are sorted. For the example you have it does. So am I
missing
something?

Avinash

On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <
[email protected]

wrote:

For that aspect no difference between a String ring based on
compareTo
and a BigInteger one.  The only difference (and it is an
important
one
for the reasons I gave!) is how the compare works. But for the
p2p
aspect it does not matter.

On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
<[email protected]> wrote:
Doing what you are suggesting scares the hell out of me for a
couple
of
reasons -  All work in P2P be it random/OPHF does the token
handling
the
way
it is setup. I cannot try something that has not been well
explored
in
academia. I insist this must be doable. I am going to think
about
this
more.

Avinash

On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
[email protected]>
wrote:

Avinash already commited his new order-preserving hash
function
and I
missed it.  It's in OrderPreservingHashPartitioner.  It
takes
the
approach that Todd and I discussed back in January: turn the
string
into a base-Char.MAX_VALUE number.
(






http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
).
I chatted with Avinash a little on IM but we didn't finish,
so
I'm
picking it up here.

There are two problems with this approach.  One is that the
hashes
will only be order-preserving for a subset of unicode (all
of
UCS-2,
but not all of UTF-16; see
http://en.wikipedia.org/wiki/UTF-16/UCS-2
).
The other is that this only gives you a naive ordering by
code
point
value, which for unicode is not what you want and even for
ascii
sometimes you want another collation.  (see
http://www.unicode.org/reports/tr10/ and

http://java.sun.com/javase/6/docs/api/java/text/Collator.html
).

Say for instance you have inserted keys ['a', 'b', '--a',
'--b']
and
you are going to do range queries on them.  The correct
collation-aware sort is ['a', '--a', 'b', '--b'].  But
ordering
by
char value gives ['--a', '--b', 'a', 'b'].

Switching to a more flexibile system like the one I wrote
for
CASSANDRA-3 will let use use Token<BigInteger> for random
distribution
or Token<String> for order-preserving, with user-defined
collation.
I
don't see a way to get this kind of flexibility from an
approach
that
insists on turning everything into BigInteger.

-Jonathan

On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
[email protected]>
wrote:
Avinash,

You mentioned that you have a new order-preserving hash
function
that
you think will be more generally useful.  Can you post it?

thanks,

-Jonathan











Reply via email to