@James,
I know and that’s the biggest problem.
Salts by definition are random seeds.
Now I have two new phrases.
1) We want to remain on a sodium free diet.
2) Learn to kick the bucket.
When you have data that is coming in on a time series, is the data mutable or
not?
A better
@Mike,
The biggest problem is you're not listening. Please actually read my
response (and you'll understand the what we're calling salting is not a
random seed).
Phoenix already has secondary indexes in two flavors: one optimized for
write-once data and one more general for fully mutable data.
@James…
You’re not listening. There is a special meaning when you say salt.
On May 18, 2014, at 7:16 PM, James Taylor jtay...@salesforce.com wrote:
@Mike,
The biggest problem is you're not listening. Please actually read my
response (and you'll understand the what we're calling salting is
The top two hits when you Google for HBase salt are
- Sematext blog describing salting as I described it in my email
- Phoenix blog again describing salting in this same way
I really don't understand what you're arguing about - the mechanism that
you're advocating for is exactly the way both
@Software Dev - if you use Phoenix, queries would leverage our Skip Scan
(which supports a superset of the FuzzyRowFilter perf improvements). Take a
look here:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html
Assuming a row key made up of a low cardinality first
Hi Mike,
I agree with you - the way you've outlined is exactly the way Phoenix has
implemented it. It's a bit of a problem with terminology, though. We call
it salting: http://phoenix.incubator.apache.org/salted.html. We hash the
key, mod the hash with the SALT_BUCKET value you provide, and
3+ Years on and a bad idea is being propagated again.
Now repeat after me… DO NO USE A SALT.
Having a low sodium diet, especially for HBase is really good for your health
and sanity.
The salt is going to be orthogonal to the row key (Key).
There is no relationship to the specific Key.
Using 4 random bytes you'll get 2^32 possibilities; thus your data can be
split enough among all the possible regions, but you won't be able to
easily benefit from distributed scans to gather what you want.
Let say you want to split (time+login) with a salted key and you expect to
be able to
Ok so there is no way around the FuzzyRowFilter checking every single
row in the table correct? If so, what is a valid use case for that
filter?
Ok so salt to a low enough prefix that makes scanning reasonable. Our
client for accessing these tables is a Rails (not JRuby) application
so we are
Edit. I should have mentioned that my access pattern is a bit
different. Ill need to scan between dates... 20140101 - 20140501, not
an individual date. My table is actually a bunch of increments so as
of right now, there is only 1 row key per timeframe.
On Sat, May 3, 2014 at 8:39 AM, Software
I'm planning to work with FuzzyRowFilter to avoid hot spotting of our
time series data (20140501, 20140502...). We can prefix all of the
keys with 4 random bytes and then just skip these during scanning. Is
that correct? These *seems* like it will work but Im questioning the
performance of this
11 matches
Mail list logo