Re: Prefix salting pattern

2014-05-19 Thread Michael Segel
This is even better when you don't necessary care about the order of every row, but want every row in a given range (then you can just get whatever row is available from a buffer in the client). You do realize that in the general case you want to return the result set in sort order. So you

Re: Prefix salting pattern

2014-05-19 Thread Mike Axiak
1) You can still query in sorted order, in which case N scans is beneficial. (In our tests: ~25% faster for N=2, going up to about ~50% faster for N=16.) 2) Many times you would issue a scan without necessarily caring about individual record order. (e.g.: let me perform some operation on all

Re: Prefix salting pattern

2014-05-19 Thread Michael Segel
You have n different scans and you then have to put the rows in sort order from each scan in to a single result set. While in each scan, the RS is in sort order, the overall set of RS needs to be merged in to one RS and that’s where you start to have issues. Again YMMV… And again… depending

Re: Prefix salting pattern

2014-05-19 Thread Mike Axiak
On Mon, May 19, 2014 at 8:53 AM, Michael Segel michael_se...@hotmail.com wrote: While in each scan, the RS is in sort order, the overall set of RS needs to be merged in to one RS and that’s where you start to have issues. What issues? As I said, in multiple tests we saw performance

Re: Prefix salting pattern

2014-05-19 Thread Michael Segel
You run n scans in parallel. You want a single result set in sort order. How do you do that? (Rhetorical) That’s the extra work that you don’t have when you have a single result set. This goes in to why the work done for secondary indexing to be associated with the base table won’t scale

Re: Prefix salting pattern

2014-05-18 Thread Michael Segel
I think I should dust off my schema design talk… clearly the talks given by some of the vendors don’t really explain things … (Hmmm. Strata London?) See my reply below…. Note I used SHA-1. MD-5 should also give you roughly the same results. On May 18, 2014, at 4:28 AM, Software Dev

Re: Prefix salting pattern

2014-05-18 Thread Software Dev
You may be missing the point. The primary reason for the salt prefix pattern is to avoid hotspotting when inserting time series data AND at the same time provide a way to perform range scans.

Re: Prefix salting pattern

2014-05-18 Thread Software Dev
on this subject and realized my second question may not be appropriate since this prefix salting pattern assumes that the prefix is random. I thought it was actually based off a hash that could be predetermined so you could alwasy, if needed, get to the exact row key with one get. Would

Re: Prefix salting pattern

2014-05-18 Thread Michael Segel
No, you’re missing the point. Its not a good idea or design. Is your data mutable or static? To your point. Everytime you want to do a simple get() you have to open up n get() statements. On your range scans you will have to do n range scans, then join and sort the result sets. The fact that

Re: Prefix salting pattern

2014-05-18 Thread Mike Axiak
In our measurements, scanning is improved by performing against n range scans rather than 1 (since you are effectively striping the reads). This is even better when you don't necessary care about the order of every row, but want every row in a given range (then you can just get whatever row is

Re: Prefix salting pattern

2014-05-18 Thread James Taylor
@Software Dev - might be feasible to implement a Thrift client that speaks Phoenix JDBC. I believe this is similar to what Hive has done. Thanks, James On Sun, May 18, 2014 at 1:19 PM, Mike Axiak m...@axiak.net wrote: In our measurements, scanning is improved by performing against n range

Prefix salting pattern

2014-05-17 Thread Software Dev
I recently came across the pattern of adding a salting prefix to the row keys to prevent hotspotting. Still trying to wrap my head around it and I have a few questions. - Is there ever a reason to salt to more buckets than there are region servers? The only reason why I think that may be

Re: Prefix salting pattern

2014-05-17 Thread Software Dev
Well kept reading on this subject and realized my second question may not be appropriate since this prefix salting pattern assumes that the prefix is random. I thought it was actually based off a hash that could be predetermined so you could alwasy, if needed, get to the exact row key with one get

Re: Prefix salting pattern

2014-05-17 Thread James Taylor
and realized my second question may not be appropriate since this prefix salting pattern assumes that the prefix is random. I thought it was actually based off a hash that could be predetermined so you could alwasy, if needed, get to the exact row key with one get. Would there be something wrong