Re: Accessing rows with number indexes

Dru Jensen Sat, 10 Jan 2009 15:57:30 -0800

I'm not sure this will work or a good idea but is it possible to usethe tableindexed feature in 0.19 and create an IndexKeyGenerator thatdoes an auto increment?


http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/client/tableindexed/package.html?view=markup



On Jan 10, 2009, at 10:32 AM, Jim Twensky wrote:

Unfortunately, yes the sentences need to be sorted. I take advantageof thelexicographical ordering of the sentences for some other purpose.Even if I
didn't, how could I generate the prefixes? Do you mean number prefixes
should be in the range [1-n] where n is the number of rows in thetable?
Since I use Hadoop to pull the data in, I can't see a trivial way to
generate number prefixes but I may be missing something obvious.

Jim

On Sat, Jan 10, 2009 at 11:55 AM, Tim Sell <[email protected]> wrote:
Do the sentences need to be sorted?
if not you could use an number prefix on the row key. Keep track of
the highest prefix and use that range to select a prefix randomly.
Then start a scanner at that prefix

~Tim.

2009/1/10 Jim Twensky <[email protected]>:
Hello,

I have an HBase table that contains sentences as row keys and a few
numeric
values as columns. A simple abstract model of the table looks likethe
following:
--------------------------------------------------------------------------------------------------------------------------
Sentence | frequency:value |probability:value-0
|     probability:value-2
--------------------------------------------------------------------------------------------------------------------------
Hello World | 5 |0.000545321
|     0.002368204
   .                              .
.                             .
   .                              .
.                             .
   .                              .
.                             .
--------------------------------------------------------------------------------------------------------------------------
I create the table and load it using Hadoop and there are hundredsof
billions of entries in it. I use this table to solve an optimization
problem
using a hill climbing/simulated annealing method. Basically, Ineed tochange the likelihood values randomly. For example, I need tochange say
the
first 5 rows starting at the 112th row and do some calculationsand so
on...
Now the problem is, I can't see an easy way to access to the n'throwdirectly. If I was using a traditional RDBMS, I'd add anothercolumn andauto-increment it each time I added a new row but this is notpossible
since
I load the table using Hadoop and the there are parallel insertions
taking
place simultaneously. A quick and dirty way to do this might beadding a
new
index column after I load and initialize the table but the tableis huge
and
it doesn't seem right to me. Another bad approach would be to use a
scanner
starting from the first row and calling Scanner.next() n timesinside a
for
loop to access the n'th row, which also seems very slow. Any ideason how
I
could do it more efficiently?

Thanks in advance,
Jim

Re: Accessing rows with number indexes

Reply via email to