> -----Original Message----- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Monday, October 22, 2007 11:45 AM > To: [email protected] > Subject: HBase Bloom filters hash > > Hi, > > I'm curious why the hashing function that these filters use > is based on SHA-1 (which is relatively slow to compute) > instead of a bunch of fast and simple non-cryptographic > functions such as Jenkins' hash (see > http://bretm.home.comcast.net/hash/7.html > for the evaluation of Jenkins hash).
The reason for SHA-1 is that it was what came with the open source bloom filter implementation we used. We've been focused on just getting things to work and not on performance, yet. If you'd like to open a Jira, it will be on the list of things to do - sometime. If this is really important to you, how about submitting a patch? There's mostly just the two of us, + contributors, so we have to put our priorities on bug fixing and making HBase robust before we get around to adding more features or doing performance analysis. We'd really like to get more contributions... the project would mature much more rapidly. Hope this helps. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
