Programming practices for implementing composite row keys
Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Programming practices for implementing composite row keys
For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Programming practices for implementing composite row keys
Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Programming practices for implementing composite row keys
Greetings, Other food for thought on some case studies on composite rowkey design are in the refguide: http://hbase.apache.org/book.html#schema.casestudies On 9/5/13 12:15 PM, Anoop John anoop.hb...@gmail.com wrote: Hi Have a look at Phoenix[1]. There you can define a composite RK model and it handles the -ve number ordering. Also the scan model u mentioned will be well supported with start/stop RK on entity1 and using SkipScanFilter for others. -Anoop- [1] https://github.com/forcedotcom/phoenix On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for -negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Programming practices for implementing composite row keys
Hi Have a look at Phoenix[1]. There you can define a composite RK model and it handles the -ve number ordering. Also the scan model u mentioned will be well supported with start/stop RK on entity1 and using SkipScanFilter for others. -Anoop- [1] https://github.com/forcedotcom/phoenix On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh