In fact what would you want is hash to respect oorder based on how the strings are sorted. For the example you have it does. So am I missing something?
Avinash On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <[email protected]> wrote: > For that aspect no difference between a String ring based on compareTo > and a BigInteger one. The only difference (and it is an important one > for the reasons I gave!) is how the compare works. But for the p2p > aspect it does not matter. > > On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman > <[email protected]> wrote: > > Doing what you are suggesting scares the hell out of me for a couple of > > reasons - All work in P2P be it random/OPHF does the token handling the > way > > it is setup. I cannot try something that has not been well explored in > > academia. I insist this must be doable. I am going to think about this > more. > > > > Avinash > > > > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <[email protected]> > wrote: > > > >> Avinash already commited his new order-preserving hash function and I > >> missed it. It's in OrderPreservingHashPartitioner. It takes the > >> approach that Todd and I discussed back in January: turn the string > >> into a base-Char.MAX_VALUE number. > >> ( > >> > http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed > >> ). > >> I chatted with Avinash a little on IM but we didn't finish, so I'm > >> picking it up here. > >> > >> There are two problems with this approach. One is that the hashes > >> will only be order-preserving for a subset of unicode (all of UCS-2, > >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2). > >> The other is that this only gives you a naive ordering by code point > >> value, which for unicode is not what you want and even for ascii > >> sometimes you want another collation. (see > >> http://www.unicode.org/reports/tr10/ and > >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html). > >> > >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and > >> you are going to do range queries on them. The correct > >> collation-aware sort is ['a', '--a', 'b', '--b']. But ordering by > >> char value gives ['--a', '--b', 'a', 'b']. > >> > >> Switching to a more flexibile system like the one I wrote for > >> CASSANDRA-3 will let use use Token<BigInteger> for random distribution > >> or Token<String> for order-preserving, with user-defined collation. I > >> don't see a way to get this kind of flexibility from an approach that > >> insists on turning everything into BigInteger. > >> > >> -Jonathan > >> > >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <[email protected]> > wrote: > >> > Avinash, > >> > > >> > You mentioned that you have a new order-preserving hash function that > >> > you think will be more generally useful. Can you post it? > >> > > >> > thanks, > >> > > >> > -Jonathan > >> > > >> > > >
