Answers in-line.
J-D
On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <[EMAIL PROTECTED]
wrote:
Hi.Few small questions:
1) BatchUpdate.*getTimestamp<
http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
<
http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
*() - If i understand correct, this method should return the timestamp
that
row will be committed with.
But how the BatchUpdate will now the timestamp ? Isn't this timestamp
should be only known after the row is written ?
Any way, the value returned is always the same and not correct.
If you do not specify a timestamp, the value returned will be
HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets
this
as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current
timestamp".
The timestamp returned will be different if you created the BatchUpdate
with
a specified timestamp, see my answer to your second question.
2) Delete Cell - i saw in the FAQ that need to add a delete record and
commit it with exactly the same timestamp like the original
row, but i didn't found any commit method with timestamp.
See the BatchUpdate
constructor<
http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
that
uses a timestamp.
3) For my update operation i need to check if the row that my >
application
holds is still contains most recent data and only in this
case i'll update some cells, to do this i need to lock the row -> >
check
the timestamp of the particular cell -> update it if
timestamp is the same that application holds. All those operation, if
they are perform on HTable will be perform by numbers of
RPC. I think, if it's possible to do those operation directly on
HRegsionServer, will help me to get rid off all extra RPCs. Is
there some way to work with specific HRegionServer that row is >
belongs
to
it ? If yes - how can i get the HRegionServer for this
specific row.
It is best to abstract how HBase works in client or this could be a mess.
For example, you would have to reimplement the finding of a region server
for a region, with retries. Instead of updating by deleting/inserting,
you
should just do a put so it will be inserted with current timestamp and,
by
default, HBase retrieves the cell with the latest timestamp for a get or
a
scan. How HBase works is very different from your typical RDBMS ;)
Thank You and Best Regards.
Slava.