Re: Questions on FuzzyRowFilter

2014-05-18 Thread Michael Segel
@James, I know and that’s the biggest problem. Salts by definition are random seeds. Now I have two new phrases. 1) We want to remain on a sodium free diet. 2) Learn to kick the bucket. When you have data that is coming in on a time series, is the data mutable or not? A better

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
@Mike, The biggest problem is you're not listening. Please actually read my response (and you'll understand the what we're calling salting is not a random seed). Phoenix already has secondary indexes in two flavors: one optimized for write-once data and one more general for fully mutable data.

Re: Questions on FuzzyRowFilter

2014-05-18 Thread Michael Segel
@James… You’re not listening. There is a special meaning when you say salt. On May 18, 2014, at 7:16 PM, James Taylor jtay...@salesforce.com wrote: @Mike, The biggest problem is you're not listening. Please actually read my response (and you'll understand the what we're calling salting is

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
The top two hits when you Google for HBase salt are - Sematext blog describing salting as I described it in my email - Phoenix blog again describing salting in this same way I really don't understand what you're arguing about - the mechanism that you're advocating for is exactly the way both

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
@Software Dev - if you use Phoenix, queries would leverage our Skip Scan (which supports a superset of the FuzzyRowFilter perf improvements). Take a look here: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html Assuming a row key made up of a low cardinality first

Re: Questions on FuzzyRowFilter

2014-05-16 Thread James Taylor
Hi Mike, I agree with you - the way you've outlined is exactly the way Phoenix has implemented it. It's a bit of a problem with terminology, though. We call it salting: http://phoenix.incubator.apache.org/salted.html. We hash the key, mod the hash with the SALT_BUCKET value you provide, and

Re: Questions on FuzzyRowFilter

2014-05-11 Thread Michael Segel
3+ Years on and a bad idea is being propagated again. Now repeat after me… DO NO USE A SALT. Having a low sodium diet, especially for HBase is really good for your health and sanity. The salt is going to be orthogonal to the row key (Key). There is no relationship to the specific Key.

Re: Questions on FuzzyRowFilter

2014-05-03 Thread Adrien Mogenet
Using 4 random bytes you'll get 2^32 possibilities; thus your data can be split enough among all the possible regions, but you won't be able to easily benefit from distributed scans to gather what you want. Let say you want to split (time+login) with a salted key and you expect to be able to

Re: Questions on FuzzyRowFilter

2014-05-03 Thread Software Dev
Ok so there is no way around the FuzzyRowFilter checking every single row in the table correct? If so, what is a valid use case for that filter? Ok so salt to a low enough prefix that makes scanning reasonable. Our client for accessing these tables is a Rails (not JRuby) application so we are

Re: Questions on FuzzyRowFilter

2014-05-03 Thread Software Dev
Edit. I should have mentioned that my access pattern is a bit different. Ill need to scan between dates... 20140101 - 20140501, not an individual date. My table is actually a bunch of increments so as of right now, there is only 1 row key per timeframe. On Sat, May 3, 2014 at 8:39 AM, Software

Questions on FuzzyRowFilter

2014-05-02 Thread Software Dev
I'm planning to work with FuzzyRowFilter to avoid hot spotting of our time series data (20140501, 20140502...). We can prefix all of the keys with 4 random bytes and then just skip these during scanning. Is that correct? These *seems* like it will work but Im questioning the performance of this