Programming practices for implementing composite row keys

2013-09-05 Thread praveenesh kumar
Hello people,

I have a scenario which requires creating composite row keys for my hbase
table.

Basically it would be entity1,entity2,entity3.

Search would be based by entity1 and then entity2 and 3.. I know I can do
row start-stopscan on entity1 first and then put row filters on entity2
and entity3.

My question is what are the best programming principles to implement these
keys.

1. Just use simple delimiters entity1:entity2:entity3.

2. Create complex datatypes like java structures. I don't know if anyone
uses structures as keys and if they do, can someone please highlight me for
which scenarios they would be good fit. Does they fit good for this
scenario.

3. What are the pros and cons for both 1 and 2, when it comes for data
retrieval.

4. My entity1 can be negative also. Does it make any special difference
when hbase ordering is concerned. How can I tackle this scenario.

Any help on how to implement composite row keys would be highly helpful. I
want to understand how the community deals with implementing composite row
keys.

Regards
Praveenesh


Re: Programming practices for implementing composite row keys

2013-09-05 Thread Ted Yu
For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which
has been integrated to 0.96

Cheers


On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 My 2 cents:

 1- Yes, that is one way to do it. You can also use fixed length for every
 attribute participating in the composite key. HBase scan would be more
 fitting to this pattern as well, I believe (?) It's a trade-off basically
 between space (all that padding increasing the key size) versus
 complexities involved in deciding and handling a delimiter and consequent
 parsing of keys etc.

 2- I personally have not heard about this. As far as I understand, this
 goes against the whole idea of HBase scanning and prefix and fuzzy filters
 will not be possible this way. This should not be followed.

 3- See replies to 1  2

 4- The sorting of the keys, by default, is binary comparator. It is a bit
 tricky as far as I know and the last I checked. Some tips here:

 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers

 Can you normalize them (or take an absolute) before reading and writing (of
 course at the cost of performance) if it is possible i.e. keys with same
 amount but different magnitude cannot exist as well as different entities.
 This depends on your business logic and type/nature of data.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hello people,
 
  I have a scenario which requires creating composite row keys for my hbase
  table.
 
  Basically it would be entity1,entity2,entity3.
 
  Search would be based by entity1 and then entity2 and 3.. I know I can do
  row start-stopscan on entity1 first and then put row filters on entity2
  and entity3.
 
  My question is what are the best programming principles to implement
 these
  keys.
 
  1. Just use simple delimiters entity1:entity2:entity3.
 
  2. Create complex datatypes like java structures. I don't know if anyone
  uses structures as keys and if they do, can someone please highlight me
 for
  which scenarios they would be good fit. Does they fit good for this
  scenario.
 
  3. What are the pros and cons for both 1 and 2, when it comes for data
  retrieval.
 
  4. My entity1 can be negative also. Does it make any special difference
  when hbase ordering is concerned. How can I tackle this scenario.
 
  Any help on how to implement composite row keys would be highly helpful.
 I
  want to understand how the community deals with implementing composite
 row
  keys.
 
  Regards
  Praveenesh
 



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Shahab Yunus
Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

Regards,
Shahab


On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

 For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which
 has been integrated to 0.96

 Cheers


 On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
 wrote:

  My 2 cents:
 
  1- Yes, that is one way to do it. You can also use fixed length for every
  attribute participating in the composite key. HBase scan would be more
  fitting to this pattern as well, I believe (?) It's a trade-off basically
  between space (all that padding increasing the key size) versus
  complexities involved in deciding and handling a delimiter and consequent
  parsing of keys etc.
 
  2- I personally have not heard about this. As far as I understand, this
  goes against the whole idea of HBase scanning and prefix and fuzzy
 filters
  will not be possible this way. This should not be followed.
 
  3- See replies to 1  2
 
  4- The sorting of the keys, by default, is binary comparator. It is a bit
  tricky as far as I know and the last I checked. Some tips here:
 
 
 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers
 
  Can you normalize them (or take an absolute) before reading and writing
 (of
  course at the cost of performance) if it is possible i.e. keys with same
  amount but different magnitude cannot exist as well as different
 entities.
  This depends on your business logic and type/nature of data.
 
  Regards,
  Shahab
 
 
  On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com
  wrote:
 
   Hello people,
  
   I have a scenario which requires creating composite row keys for my
 hbase
   table.
  
   Basically it would be entity1,entity2,entity3.
  
   Search would be based by entity1 and then entity2 and 3.. I know I can
 do
   row start-stopscan on entity1 first and then put row filters on
 entity2
   and entity3.
  
   My question is what are the best programming principles to implement
  these
   keys.
  
   1. Just use simple delimiters entity1:entity2:entity3.
  
   2. Create complex datatypes like java structures. I don't know if
 anyone
   uses structures as keys and if they do, can someone please highlight me
  for
   which scenarios they would be good fit. Does they fit good for this
   scenario.
  
   3. What are the pros and cons for both 1 and 2, when it comes for data
   retrieval.
  
   4. My entity1 can be negative also. Does it make any special
 difference
   when hbase ordering is concerned. How can I tackle this scenario.
  
   Any help on how to implement composite row keys would be highly
 helpful.
  I
   want to understand how the community deals with implementing composite
  row
   keys.
  
   Regards
   Praveenesh
  
 



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Doug Meil

Greetings, 

Other food for thought on some case studies on composite rowkey design are
in the refguide:

http://hbase.apache.org/book.html#schema.casestudies






On 9/5/13 12:15 PM, Anoop John anoop.hb...@gmail.com wrote:

Hi
  Have a look at Phoenix[1].  There you can define a composite RK
model and it handles the -ve number ordering.  Also the scan model u
mentioned will be well supported with start/stop RK on entity1 and
using SkipScanFilter
for others.

-Anoop-

[1] https://github.com/forcedotcom/phoenix


On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:

 Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

  For #2 and #4, see HBASE-8693 'DataType: provide extensible type API'
 which
  has been integrated to 0.96
 
  Cheers
 
 
  On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
  wrote:
 
   My 2 cents:
  
   1- Yes, that is one way to do it. You can also use fixed length for
 every
   attribute participating in the composite key. HBase scan would be
more
   fitting to this pattern as well, I believe (?) It's a trade-off
 basically
   between space (all that padding increasing the key size) versus
   complexities involved in deciding and handling a delimiter and
 consequent
   parsing of keys etc.
  
   2- I personally have not heard about this. As far as I understand,
this
   goes against the whole idea of HBase scanning and prefix and fuzzy
  filters
   will not be possible this way. This should not be followed.
  
   3- See replies to 1  2
  
   4- The sorting of the keys, by default, is binary comparator. It is
a
 bit
   tricky as far as I know and the last I checked. Some tips here:
  
  
 
 
http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for
-negative-integers
  
   Can you normalize them (or take an absolute) before reading and
writing
  (of
   course at the cost of performance) if it is possible i.e. keys with
 same
   amount but different magnitude cannot exist as well as different
  entities.
   This depends on your business logic and type/nature of data.
  
   Regards,
   Shahab
  
  
   On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar 
 praveen...@gmail.com
   wrote:
  
Hello people,
   
I have a scenario which requires creating composite row keys for
my
  hbase
table.
   
Basically it would be entity1,entity2,entity3.
   
Search would be based by entity1 and then entity2 and 3.. I know I
 can
  do
row start-stopscan on entity1 first and then put row filters on
  entity2
and entity3.
   
My question is what are the best programming principles to
implement
   these
keys.
   
1. Just use simple delimiters entity1:entity2:entity3.
   
2. Create complex datatypes like java structures. I don't know if
  anyone
uses structures as keys and if they do, can someone please
highlight
 me
   for
which scenarios they would be good fit. Does they fit good for
this
scenario.
   
3. What are the pros and cons for both 1 and 2, when it comes for
 data
retrieval.
   
4. My entity1 can be negative also. Does it make any special
  difference
when hbase ordering is concerned. How can I tackle this scenario.
   
Any help on how to implement composite row keys would be highly
  helpful.
   I
want to understand how the community deals with implementing
 composite
   row
keys.
   
Regards
Praveenesh
   
  
 




Re: Programming practices for implementing composite row keys

2013-09-05 Thread Anoop John
Hi
  Have a look at Phoenix[1].  There you can define a composite RK
model and it handles the -ve number ordering.  Also the scan model u
mentioned will be well supported with start/stop RK on entity1 and
using SkipScanFilter
for others.

-Anoop-

[1] https://github.com/forcedotcom/phoenix


On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

  For #2 and #4, see HBASE-8693 'DataType: provide extensible type API'
 which
  has been integrated to 0.96
 
  Cheers
 
 
  On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
  wrote:
 
   My 2 cents:
  
   1- Yes, that is one way to do it. You can also use fixed length for
 every
   attribute participating in the composite key. HBase scan would be more
   fitting to this pattern as well, I believe (?) It's a trade-off
 basically
   between space (all that padding increasing the key size) versus
   complexities involved in deciding and handling a delimiter and
 consequent
   parsing of keys etc.
  
   2- I personally have not heard about this. As far as I understand, this
   goes against the whole idea of HBase scanning and prefix and fuzzy
  filters
   will not be possible this way. This should not be followed.
  
   3- See replies to 1  2
  
   4- The sorting of the keys, by default, is binary comparator. It is a
 bit
   tricky as far as I know and the last I checked. Some tips here:
  
  
 
 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers
  
   Can you normalize them (or take an absolute) before reading and writing
  (of
   course at the cost of performance) if it is possible i.e. keys with
 same
   amount but different magnitude cannot exist as well as different
  entities.
   This depends on your business logic and type/nature of data.
  
   Regards,
   Shahab
  
  
   On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar 
 praveen...@gmail.com
   wrote:
  
Hello people,
   
I have a scenario which requires creating composite row keys for my
  hbase
table.
   
Basically it would be entity1,entity2,entity3.
   
Search would be based by entity1 and then entity2 and 3.. I know I
 can
  do
row start-stopscan on entity1 first and then put row filters on
  entity2
and entity3.
   
My question is what are the best programming principles to implement
   these
keys.
   
1. Just use simple delimiters entity1:entity2:entity3.
   
2. Create complex datatypes like java structures. I don't know if
  anyone
uses structures as keys and if they do, can someone please highlight
 me
   for
which scenarios they would be good fit. Does they fit good for this
scenario.
   
3. What are the pros and cons for both 1 and 2, when it comes for
 data
retrieval.
   
4. My entity1 can be negative also. Does it make any special
  difference
when hbase ordering is concerned. How can I tackle this scenario.
   
Any help on how to implement composite row keys would be highly
  helpful.
   I
want to understand how the community deals with implementing
 composite
   row
keys.
   
Regards
Praveenesh