Yes I saw it. I followed Ted advice to use
scan.setTimeRange(sometimestamp, Long.MAX_VALUE)
On Wed, Jul 3, 2013 at 11:23 PM, Asaf Mesika asaf.mes...@gmail.com wrote:
Seems right. You can make it more efficient by creating your result array
in advance and then fill it.
Regarding time
Thank you very much for the great support!
This is how I thought to design my key:
PATTERN: source|type|qualifier|hash(name)|timestamp
EXAMPLE:
google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753
Do you think my key could be good for my scope (my search will be
essentially by
I'm not sure if you're eliding this fact or not, but you'd be much
better off if you used a fixed-width format for your keys. So in your
example, you'd have:
PATTERN: source(4-byte-int).type(4-byte-int or smaller).fixed 128-bit
hash.8-byte timestamp
Example: \x00\x00\x00\x01\x00\x00\x02\x03
Yeah, I was thinking to use a normalization step in order to allow the use
of FuzzyRowFilter but what is not clear to me is if integers must also be
normalized or not.
I will explain myself better. Suppose that i follow your advice and I
produce keys like:
- 1|1|somehash|sometimestamp
-
When you make the RK and convert the int parts into byte[] ( Use
org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *) it will give 4 bytes
for every byte.. Be careful about the ordering... When u convert a +ve
and -ve integer into byte[] and u do Lexiographical compare (as done in
HBase) u will
Hi Flavio,
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)?
It will allow you to model your multi-part row key like this:
CREATE TABLE flavio.analytics (
source INTEGER,
type INTEGER,
qual VARCHAR,
hash VARCHAR,
ts DATE
CONSTRAINT pk PRIMARY KEY
All my enums produce positive integers so I don't have +/-ve Integer
problems.
Obviously If I use fixed-length rowKeys I could take away the separator..
Sorry but I'm very a newbie in this field..I'm trying to understand how to
compose my key with the bytes..
Is it correct the following?
final
No, I've never seen Phoenix, but it looks like a very useful project!
However I don't have such strict performance issues in my use case, I just
want to have balanced regions as much as possible.
So I think that in this case I will still use Bytes concatenation if
someone confirm I'm doing it in
The two argument Bytes.add() calls:
return add(a, b, HConstants.EMPTY_BYTE_ARRAY);
where a new byte array is allocated:
byte [] result = new byte[a.length + b.length + c.length];
Meaning your code below would allocate two byte arrays.
Consider writing a method that accepts 4 byte []
Sure, but FYI Phoenix is not just faster, but much easier as well (as
this email chain shows).
On 07/03/2013 04:25 AM, Flavio Pompermaier wrote:
No, I've never seen Phoenix, but it looks like a very useful project!
However I don't have such strict performance issues in my use case, I just
want
Seems right. You can make it more efficient by creating your result array
in advance and then fill it.
Regarding time filtering. Have you see that in Scan you can set start time
and end time?
On Wednesday, July 3, 2013, Flavio Pompermaier wrote:
All my enums produce positive integers so I don't
Hi to everybody,
in my use case I have to perform batch analysis skipping old data.
For example, I want to process all rows created after a certain timestamp,
passed as parameter.
What is the most effective way to do this?
Should I design my row-key to embed timestamp?
Or just filtering by
bq. Using timestamp in row-keys is discouraged
The above is true.
Prefixing row key with timestamp would create hot region.
bq. should I filter by a simpler row-key plus a filter on timestamp?
You can do the above.
On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier pomperma...@okkam.itwrote:
For #1, yes - the client receives less data after filtering.
For #2, please take a look at TestMultiVersions
(./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94)
for time range:
scan = new Scan();
scan.setTimeRange(1000L, Long.MAX_VALUE);
For row key selection, you
14 matches
Mail list logo