The only method of determining uniqueness of data in general in hbase is via the row key. Just like a primary key in a database, you can use it to verify uniqueness, and do index scans and gets.
So generally speaking, yes you will have to make multiple trips to the server to use a secondary index. The situation might not be as dire as it seems, since in 0.20 the speed targets for small data gets/sets is really low (like maybe 1 ms?). The solution to "need to do more" for hbase is generally 'well use map-reduce'... which is the solution i will offer you as well. Hopefully this answers some of your questions. Good luck! -ryan On Thu, Mar 5, 2009 at 1:00 AM, Eran Bergman <[email protected]>wrote: > Hello, > > Lately I have been experimenting with HBase and I came across a problem I > don't know how to solve yet. > My problem is data uniqueness, meaning I would like to have unique data in > a > specified column (taking into account all or some subset of my rows). > I would like to have that for any number of columns which I will specify > (various types of data). > > Usually the way to do this is to use some sort of indexing method, but this > will amount to round trips to the server for uniqueness checks before I > commit, which are very costly. > > Does anyone have any thoughts on how to do this? > > > Thanks, > Eran >
