Re: Help in designing row key

2013-07-04 Thread Flavio Pompermaier
Yes I saw it. I followed Ted advice to use scan.setTimeRange(sometimestamp, Long.MAX_VALUE) On Wed, Jul 3, 2013 at 11:23 PM, Asaf Mesika asaf.mes...@gmail.com wrote: Seems right. You can make it more efficient by creating your result array in advance and then fill it. Regarding time

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
Thank you very much for the great support! This is how I thought to design my key: PATTERN: source|type|qualifier|hash(name)|timestamp EXAMPLE: google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753 Do you think my key could be good for my scope (my search will be essentially by

Re: Help in designing row key

2013-07-03 Thread Mike Axiak
I'm not sure if you're eliding this fact or not, but you'd be much better off if you used a fixed-width format for your keys. So in your example, you'd have: PATTERN: source(4-byte-int).type(4-byte-int or smaller).fixed 128-bit hash.8-byte timestamp Example: \x00\x00\x00\x01\x00\x00\x02\x03

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
Yeah, I was thinking to use a normalization step in order to allow the use of FuzzyRowFilter but what is not clear to me is if integers must also be normalized or not. I will explain myself better. Suppose that i follow your advice and I produce keys like: - 1|1|somehash|sometimestamp -

Re: Help in designing row key

2013-07-03 Thread Anoop John
When you make the RK and convert the int parts into byte[] ( Use org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *) it will give 4 bytes for every byte.. Be careful about the ordering... When u convert a +ve and -ve integer into byte[] and u do Lexiographical compare (as done in HBase) u will

Re: Help in designing row key

2013-07-03 Thread James Taylor
Hi Flavio, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It will allow you to model your multi-part row key like this: CREATE TABLE flavio.analytics ( source INTEGER, type INTEGER, qual VARCHAR, hash VARCHAR, ts DATE CONSTRAINT pk PRIMARY KEY

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
All my enums produce positive integers so I don't have +/-ve Integer problems. Obviously If I use fixed-length rowKeys I could take away the separator.. Sorry but I'm very a newbie in this field..I'm trying to understand how to compose my key with the bytes.. Is it correct the following? final

Re: Help in designing row key

2013-07-03 Thread Flavio Pompermaier
No, I've never seen Phoenix, but it looks like a very useful project! However I don't have such strict performance issues in my use case, I just want to have balanced regions as much as possible. So I think that in this case I will still use Bytes concatenation if someone confirm I'm doing it in

Re: Help in designing row key

2013-07-03 Thread Ted Yu
The two argument Bytes.add() calls: return add(a, b, HConstants.EMPTY_BYTE_ARRAY); where a new byte array is allocated: byte [] result = new byte[a.length + b.length + c.length]; Meaning your code below would allocate two byte arrays. Consider writing a method that accepts 4 byte []

Re: Help in designing row key

2013-07-03 Thread James Taylor
Sure, but FYI Phoenix is not just faster, but much easier as well (as this email chain shows). On 07/03/2013 04:25 AM, Flavio Pompermaier wrote: No, I've never seen Phoenix, but it looks like a very useful project! However I don't have such strict performance issues in my use case, I just want

Re: Help in designing row key

2013-07-03 Thread Asaf Mesika
Seems right. You can make it more efficient by creating your result array in advance and then fill it. Regarding time filtering. Have you see that in Scan you can set start time and end time? On Wednesday, July 3, 2013, Flavio Pompermaier wrote: All my enums produce positive integers so I don't

Help in designing row key

2013-07-02 Thread Flavio Pompermaier
Hi to everybody, in my use case I have to perform batch analysis skipping old data. For example, I want to process all rows created after a certain timestamp, passed as parameter. What is the most effective way to do this? Should I design my row-key to embed timestamp? Or just filtering by

Re: Help in designing row key

2013-07-02 Thread Ted Yu
bq. Using timestamp in row-keys is discouraged The above is true. Prefixing row key with timestamp would create hot region. bq. should I filter by a simpler row-key plus a filter on timestamp? You can do the above. On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier pomperma...@okkam.itwrote:

Re: Help in designing row key

2013-07-02 Thread Ted Yu
For #1, yes - the client receives less data after filtering. For #2, please take a look at TestMultiVersions (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94) for time range: scan = new Scan(); scan.setTimeRange(1000L, Long.MAX_VALUE); For row key selection, you