Re: How can we process complex logic on hbase

2014-05-18 Thread Adrien Mogenet
Could you give us an example of what you call complex logic? Perhaps putting this logic on client side could make sense? (probably not you want, just asking...) On Sat, May 17, 2014 at 7:26 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Moving the discussion to the user list. Hi

Re: Prefix salting pattern

2014-05-18 Thread Michael Segel
I think I should dust off my schema design talk… clearly the talks given by some of the vendors don’t really explain things … (Hmmm. Strata London?) See my reply below…. Note I used SHA-1. MD-5 should also give you roughly the same results. On May 18, 2014, at 4:28 AM, Software Dev

Re: Prefix salting pattern

2014-05-18 Thread Software Dev
You may be missing the point. The primary reason for the salt prefix pattern is to avoid hotspotting when inserting time series data AND at the same time provide a way to perform range scans.

Re: Prefix salting pattern

2014-05-18 Thread Software Dev
James, thanks for the input. Not too familiar with Phoenix although it looks like a great contrib. Unfortunately our main client is ruby using the thrift api. Using the thrift api also makes parallel scans tough, if not impossible. On Sat, May 17, 2014 at 9:31 PM, James Taylor

Re: Prefix salting pattern

2014-05-18 Thread Michael Segel
No, you’re missing the point. Its not a good idea or design. Is your data mutable or static? To your point. Everytime you want to do a simple get() you have to open up n get() statements. On your range scans you will have to do n range scans, then join and sort the result sets. The fact that

Re: Questions on FuzzyRowFilter

2014-05-18 Thread Michael Segel
@James, I know and that’s the biggest problem. Salts by definition are random seeds. Now I have two new phrases. 1) We want to remain on a sodium free diet. 2) Learn to kick the bucket. When you have data that is coming in on a time series, is the data mutable or not? A better

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
@Mike, The biggest problem is you're not listening. Please actually read my response (and you'll understand the what we're calling salting is not a random seed). Phoenix already has secondary indexes in two flavors: one optimized for write-once data and one more general for fully mutable data.

Re: Questions on FuzzyRowFilter

2014-05-18 Thread Michael Segel
@James… You’re not listening. There is a special meaning when you say salt. On May 18, 2014, at 7:16 PM, James Taylor jtay...@salesforce.com wrote: @Mike, The biggest problem is you're not listening. Please actually read my response (and you'll understand the what we're calling salting is

Re: Prefix salting pattern

2014-05-18 Thread Mike Axiak
In our measurements, scanning is improved by performing against n range scans rather than 1 (since you are effectively striping the reads). This is even better when you don't necessary care about the order of every row, but want every row in a given range (then you can just get whatever row is

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
The top two hits when you Google for HBase salt are - Sematext blog describing salting as I described it in my email - Phoenix blog again describing salting in this same way I really don't understand what you're arguing about - the mechanism that you're advocating for is exactly the way both

Re: Questions on FuzzyRowFilter

2014-05-18 Thread James Taylor
@Software Dev - if you use Phoenix, queries would leverage our Skip Scan (which supports a superset of the FuzzyRowFilter perf improvements). Take a look here: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html Assuming a row key made up of a low cardinality first

Re: Prefix salting pattern

2014-05-18 Thread James Taylor
@Software Dev - might be feasible to implement a Thrift client that speaks Phoenix JDBC. I believe this is similar to what Hive has done. Thanks, James On Sun, May 18, 2014 at 1:19 PM, Mike Axiak m...@axiak.net wrote: In our measurements, scanning is improved by performing against n range