Hi,

I wanted some clarification for the following doubt I am having regarding
HBase functioning:

I am using BulkImport program to store milliions of documents(text) in a
HTable. Each of the tasktracker reads some portion (given to it according to
input split calculations) of a big file, extracts individual documents from
the split and stores them in the HTable with document id as the row key. 
Now HBase claims that it stores the rows in sorted manner. My question is
that how does it sorts the row keys when random integers(row keys) are
emitted by the tasktrackers i.e. When a new row id comes, how does the HBase
client knows in which region to store the row? Suppose a row id is to be
stored that lies between two already stored rows in the HTable. Where will
this row now be stored? Does it reshuffles them?

Any understanding of the working of HBase / any reference will be helpful.

Thanks,
Akhil


-- 
View this message in context: 
http://www.nabble.com/Sorting-in-HBase-tp24613055p24613055.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to