Hi Ryan,
Thank you for your reply. Actually, my source data file is a sequence of
triples. Each line is a triple of the form (k, p, v), which means the key k
has a property p whose value is v. A key k can have multiple different
properties and values. And the triples for the same key may not occur in the
data file consecutively. However, I want to store a key k with all its
properties and values, p1, v1, p2, v2, ... pn, vn, as a row in HTable. My
HTable structure is as follows:
----------------------------------------------------------------------------------------
MyColumnFamily
----------------------------------------------------------------------------------------
| numOfCol | p1 | v1 | p2 | v2 | ... | pn |
vn |
----------------------------------------------------------------------------------------
where numOfCol records the number of the following column pairs p1, v1, p2,
v2, ..., pn, vn. I need numOfCol because when I read in a triple later on
and it is the (n+1) th property and value for key k, then I use (numberOfCol
+ 1) to make the column name of the (n+1) th property and value.
So as your code snippet shown, using a reference to the lastPut is not
enough for my case. This is why I have to use HTable.get() to retrieve a row
key (i.e., a key k) that was HTable.put() before. Of course I don't want to
use setAutoFlush(true), it's too slow. Do you have any suggestion?
Thank you so much!
Best wishes,
--
Xin
2010/11/23 Ryan Rawson <[email protected]>
> Hi,
>
> You could implement this in a code structure like so:
>
> HTable table = new HTable(tableName, conf);
> Put lastPut = null;
> while ( moreData ) {
> Put put = makeNewPutBasedOnLastPutToo( lastPut, dataSource );
> table.put(put);
> lastPut = put;
> dataSource.next();
> }
>
> if that is unsatisfactory you may access the write buffer via
> HTable.getWRiteBuffer().
>
> -ryan
>
>
> On Mon, Nov 22, 2010 at 5:41 PM, Xin Wang <[email protected]> wrote:
> > Hello everyone,
> >
> > I am a beginner to HBase. I want to load a data file of 2 million lines
> > into a HBase table.
> > I want to load data as fast as possible, so I called
> > HTable.setAutoFlush(false) at the beginning. However, when I HTable.put()
> a
> > row and then HTable.get() the same row, the result is empty. I know this
> is
> > because the setAutoFlush(false) make put() write into the buffer. But the
> > algorithm in my loading process requires to read the value of the
> previous
> > one that just is put into the HTable cell. I have tried to make
> > setAutoFlush(true), although the previous value can be read but the
> loading
> > process is slower down by about an order of magnitude. Can I get() value
> > directly from the write buffer? Are there any other solutions to this
> > problem that I do not know? Thank you in advance!
> >
> > Best regards,
> >
> > Xin Wang
> >
>