subject:"Row level locking\?"

Re: Row level locking?

2010-07-17 Thread Michael Dalton

To avoid the row lock deadlock case Ryan mentioned I created a patch to add
a non-blocking tryLock method to HTable. There's a patch at
https://issues.apache.org/jira/browse/HBASE-2584 although I'm not sure if it
still applies to trunk. The basic idea is to immediately return null if the
lock is contended rather than sleep and wait for the lock to be acquired
(which would consume an RPC thread and could result in deadlock). The client
can then adopt a backoff/retry lock policy if on the client side if tryLock
returns null. There has also been some discussion at
https://issues.apache.org/jira/browse/HBASE-2332 about replacing the
client-side row lock APIs which perform mandatory locking with advisory
locks, which should also avoid any server-side blocking/deadlock when
implemented.

Best regards,

Mike

On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohen justin.co...@teamaol.comwrote:

 What kind of trouble?  I do quite a bit of:

l = lock(row);
val = get(row);
/* modify val */
put(row, val);
unlock(l);

 Is there an alternative?

 -justin


 On 7/16/10 4:02 PM, Ryan Rawson wrote:

 Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.

RE: Row level locking?

2010-07-17 Thread Michael Segel

 Date: Fri, 16 Jul 2010 13:02:15 -0700
 Subject: Re: Row level locking?
 From: ryano...@gmail.com
 To: user@hbase.apache.org
 CC: hbase-u...@hadoop.apache.org

 HTable.close does very little:

   public void close() throws IOException{
 flushCommits();
   }

 None of which involves row locks.

 One thing to watch out for is to remember to close your scanners -
 they continue to use server-side resources until you close them or 60
 seconds passes and they get timed out.  Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.  There was a proposal to remove them, but I don't
 think that went through.

Thanks Ryan.

This may be more of the issue that they are seeing.
I have to do a code review on their code to see what they are doing. 
In batch (map/reduce) this isn't much of an issue.
In a more transactional use case, this can become an issue.

(Batch you're usually using a scanner at the start of the m/r and then 
processing the data...)

_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Row level locking?

2010-07-16 Thread Michael Segel


Ok,

First, I'm writing this before I've had my first cup of coffee so I am 
apologizing in advance if the question is a brain dead question

Going from a relational background, some of these questions may not make sense 
in the HBase world.


When does HBase acquire a lock on a row and how long does it persist? Does the 
lock only hit the current row, or does it also lock the adjacent rows too?
Does HBase support the concept of 'dirty reads'? 

The issue is what happens when you have two jobs trying to hit the same table 
at the same time and update/read the rows at the same time.

A developer came across a problem and the fix was to use the HTable.close() 
method to release any resources.

I am wondering if you explicitly have to clean up or can a lazy developer let 
the object just go out of scope and get GC'd.

Thx

-Mike

  
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Currently a row is part of a region and there's a single region server serving
that region at a particular moment.
So when that row is updated a lock is acquired for that row until the actual
data is updated in memory (note that a put will be written to cache on the
region server and also persisted in the write-ahead log - WAL). Subsequent puts
to that row will have to wait for that lock.

HBase is fully consistent. This being said all the locking takes place at row
level only, so when you scan you have to take that into account as there's no
range locking.

I'm not sure I understand the resource releasing issue. HTable.close() flushes
the current write buffer (you can have write buffer if you use autoFlush set to
false).

Cosmin

On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:

Ok,

First, I'm writing this before I've had my first cup of coffee so I am
apologizing in advance if the question is a brain dead question

Going from a relational background, some of these questions may not make
sense in the HBase world.

When does HBase acquire a lock on a row and how long does it persist? Does
the lock only hit the current row, or does it also lock the adjacent rows too?
Does HBase support the concept of 'dirty reads'?

The issue is what happens when you have two jobs trying to hit the same table
at the same time and update/read the rows at the same time.

A developer came across a problem and the fix was to use the HTable.close()
method to release any resources.

I am wondering if you explicitly have to clean up or can a lazy developer let
the object just go out of scope and get GC'd.

Thx

-Mike

_
The New Busy is not the too busy. Combine all your e-mail accounts with
Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

RE: Row level locking?

2010-07-16 Thread Michael Segel

Thanks for the response.
(You don't need to include the cc ...)

With respect to the row level locking ...
I was interested in when the lock is actually acquired, how long the lock
persists and when is the lock released.
From your response, the lock is only held on updating the row, and while the
data is being written to the memory cache which is then written to disk.
(Note: This row level locking different than transactional row level locking.)

Now that I've had some caffeine I think I can clarify... :-)

Some of my developers complained that they were having trouble with two
different processes trying to update the same table.
Not sure why they were having the problem, so I wanted to have a good fix. The
simple fix was to have them issue the close() the HTable connection which
forces any resources that they acquired to be released.

In looking at the problem... its possible that they didn't have AutoFlush set
to true so the write was still in the buffer and hadn't gotten flushed.

If the lock only persists for the duration of the write to memory and is then
released, then the issue could have been that the record written was in the
buffer and not yet flushed to disk.

I'm also assuming that when you run a scan() against a region that any
information written to buffer but not yet written to disk will be missed.

So I guess the question isn't so much the issue of a lock, but that we need to
make sure that data written to the buffer should be flushed ASAP unless we know
that we're going to be writing a lot of data in the m/r job.

Thx

-Mike

From: cleh...@adobe.com
To: user@hbase.apache.org
CC: hbase-u...@hadoop.apache.org
Date: Fri, 16 Jul 2010 12:34:36 +0100
Subject: Re: Row level locking?

Currently a row is part of a region and there's a single region server
serving that region at a particular moment.
So when that row is updated a lock is acquired for that row until the actual
data is updated in memory (note that a put will be written to cache on the
region server and also persisted in the write-ahead log - WAL). Subsequent
puts to that row will have to wait for that lock.

HBase is fully consistent. This being said all the locking takes place at row
level only, so when you scan you have to take that into account as there's no
range locking.

I'm not sure I understand the resource releasing issue. HTable.close()
flushes the current write buffer (you can have write buffer if you use
autoFlush set to false).

Cosmin

On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:

Ok,

First, I'm writing this before I've had my first cup of coffee so I am
apologizing in advance if the question is a brain dead question

Going from a relational background, some of these questions may not make
sense in the HBase world.

When does HBase acquire a lock on a row and how long does it persist? Does
the lock only hit the current row, or does it also lock the adjacent rows
too?
Does HBase support the concept of 'dirty reads'?

The issue is what happens when you have two jobs trying to hit the same
table at the same time and update/read the rows at the same time.

A developer came across a problem and the fix was to use the HTable.close()
method to release any resources.

I am wondering if you explicitly have to clean up or can a lazy developer
let the object just go out of scope and get GC'd.

Thx

-Mike

_
The New Busy is not the too busy. Combine all your e-mail accounts with
Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

_
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Re: Row level locking?

2010-07-16 Thread Ryan Rawson

HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


None of which involves row locks.

One thing to watch out for is to remember to close your scanners -
they continue to use server-side resources until you close them or 60
seconds passes and they get timed out.  Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.  There was a proposal to remove them, but I don't
think that went through.


On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:

 On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:



 Thanks for the response.
 (You don't need to include the cc ...)

 With respect to the row level locking ...
 I was interested in when the lock is actually acquired, how long the lock 
 persists and when is the lock released.
 From your response, the lock is only held on updating the row, and while the 
 data is being written to the memory cache which is then written to disk. 
 (Note: This row level locking different than transactional row level locking.)

 Now that I've had some caffeine I think I can clarify... :-)

 Some of my developers complained that they were having trouble with two 
 different processes trying to update the same table.
 Not sure why they were having the problem, so I wanted to have a good fix. 
 The simple fix was to have them issue the close() the HTable connection which 
 forces any resources that they acquired to be released.


 It would help to know what the exact problem was. Normally I wouldn't see any 
 problems.


 In looking at the problem... its possible that they didn't have AutoFlush set 
 to true so the write was still in the buffer and hadn't gotten flushed.

 If the lock only persists for the duration of the write to memory and is then 
 released, then the issue could have been that the record written was in the 
 buffer and not yet flushed to disk.


 At the region server level HBase will use the cache for both reads and 
 writes. This happens transparently for the user. Once something is written in 
 the cache, all other clients will read from the same cache. No need to worry 
 if the cache has been flushed.
 Lars George has a good article about the hbase storage architecture 
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

 I'm also assuming that when you run a scan() against a region that any 
 information written to buffer but not yet written to disk will be missed.


 When you do puts into hbase you'll use HTable. The HTable instance is on the 
 client.  HTable keeps a buffer as well and if autoFlush is false it only 
 flushes when you do flushCommits() or when it reaches the buffer limit, or 
 when you close the table. With autoFlush set to true it will flush for every 
 put.
 This buffer is on the client. So when data is actually flushed it gets on the 
 region server where it will get in the region server cache and WAL.
 Unless a client flushes the put no other client can see the data because it 
 still resides on the client only. Depending on what you need to do you can 
 use autoFlush true if you are doing many small writes that need to be seen 
 immediately by others. You can use autoFlush false and issue flushCommits() 
 yourself, or you can rely on the buffer limit for that.

 So I guess the question isn't so much the issue of a lock, but that we need 
 to make sure that data written to the buffer should be flushed ASAP unless we 
 know that we're going to be writing a lot of data in the m/r job.


 Usually when you write from the reducer (heavy) is better to use a buffer and 
 not autoFlush to have a good performance.

 Cosmin


 Thx

 -Mike



 From: cleh...@adobe.commailto:cleh...@adobe.com
 To: user@hbase.apache.orgmailto:user@hbase.apache.org
 CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
 Date: Fri, 16 Jul 2010 12:34:36 +0100
 Subject: Re: Row level locking?

 Currently a row is part of a region and there's a single region server 
 serving that region at a particular moment.
 So when that row is updated a lock is acquired for that row until the actual 
 data is updated in memory (note that a put will be written to cache on the 
 region server and also persisted in the write-ahead log - WAL). Subsequent 
 puts to that row will have to wait for that lock.

 HBase is fully consistent. This being said all the locking takes place at row 
 level only, so when you scan you have to take that into account as there's no 
 range locking.

 I'm not sure I understand the resource releasing issue. HTable.close() 
 flushes the current write buffer (you can have write buffer if you use 
 autoFlush set to false).

 Cosmin


 On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:


 Ok,

 First, I'm writing this before I've had my first cup of coffee so I am 
 apologizing in advance if the question is a brain dead question

 Going from a relational background, some of these questions may

Re: Row level locking?

2010-07-16 Thread Guilherme Germoglio

What about implementing explicit row locks using the zookeeper? I'm planning
to do this sometime in the near future. Does anyone have any comments
against this approach?

(or maybe it was already implemented by someone :-)

On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:

 HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


 None of which involves row locks.

 One thing to watch out for is to remember to close your scanners -
 they continue to use server-side resources until you close them or 60
 seconds passes and they get timed out.  Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.  There was a proposal to remove them, but I don't
 think that went through.


 On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
 
  On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
 
 
 
  Thanks for the response.
  (You don't need to include the cc ...)
 
  With respect to the row level locking ...
  I was interested in when the lock is actually acquired, how long the lock
 persists and when is the lock released.
  From your response, the lock is only held on updating the row, and while
 the data is being written to the memory cache which is then written to disk.
 (Note: This row level locking different than transactional row level
 locking.)
 
  Now that I've had some caffeine I think I can clarify... :-)
 
  Some of my developers complained that they were having trouble with two
 different processes trying to update the same table.
  Not sure why they were having the problem, so I wanted to have a good
 fix. The simple fix was to have them issue the close() the HTable connection
 which forces any resources that they acquired to be released.
 
 
  It would help to know what the exact problem was. Normally I wouldn't see
 any problems.
 
 
  In looking at the problem... its possible that they didn't have AutoFlush
 set to true so the write was still in the buffer and hadn't gotten flushed.
 
  If the lock only persists for the duration of the write to memory and is
 then released, then the issue could have been that the record written was in
 the buffer and not yet flushed to disk.
 
 
  At the region server level HBase will use the cache for both reads and
 writes. This happens transparently for the user. Once something is written
 in the cache, all other clients will read from the same cache. No need to
 worry if the cache has been flushed.
  Lars George has a good article about the hbase storage architecture
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  I'm also assuming that when you run a scan() against a region that any
 information written to buffer but not yet written to disk will be missed.
 
 
  When you do puts into hbase you'll use HTable. The HTable instance is on
 the client.  HTable keeps a buffer as well and if autoFlush is false it only
 flushes when you do flushCommits() or when it reaches the buffer limit, or
 when you close the table. With autoFlush set to true it will flush for every
 put.
  This buffer is on the client. So when data is actually flushed it gets on
 the region server where it will get in the region server cache and WAL.
  Unless a client flushes the put no other client can see the data because
 it still resides on the client only. Depending on what you need to do you
 can use autoFlush true if you are doing many small writes that need to be
 seen immediately by others. You can use autoFlush false and issue
 flushCommits() yourself, or you can rely on the buffer limit for that.
 
  So I guess the question isn't so much the issue of a lock, but that we
 need to make sure that data written to the buffer should be flushed ASAP
 unless we know that we're going to be writing a lot of data in the m/r job.
 
 
  Usually when you write from the reducer (heavy) is better to use a buffer
 and not autoFlush to have a good performance.
 
  Cosmin
 
 
  Thx
 
  -Mike
 
 
 
  From: cleh...@adobe.commailto:cleh...@adobe.com
  To: user@hbase.apache.orgmailto:user@hbase.apache.org
  CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
  Date: Fri, 16 Jul 2010 12:34:36 +0100
  Subject: Re: Row level locking?
 
  Currently a row is part of a region and there's a single region server
 serving that region at a particular moment.
  So when that row is updated a lock is acquired for that row until the
 actual data is updated in memory (note that a put will be written to cache
 on the region server and also persisted in the write-ahead log - WAL).
 Subsequent puts to that row will have to wait for that lock.
 
  HBase is fully consistent. This being said all the locking takes place at
 row level only, so when you scan you have to take that into account as
 there's no range locking.
 
  I'm not sure I understand the resource releasing issue. HTable.close()
 flushes the current write buffer (you can have

Re: Row level locking?

2010-07-16 Thread Ryan Rawson

Explicit locks with zookeeper would be (a) slow and (b) completely out
of band and ultimately up to you.  I wouldn't exactly be eager to do
our row locking in zookeeper (since the minimum operation time is
between 2-10ms).

You could do application advisory locks, but that is true no matter
what datastore you use...

On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
germog...@gmail.com wrote:
 What about implementing explicit row locks using the zookeeper? I'm planning
 to do this sometime in the near future. Does anyone have any comments
 against this approach?

 (or maybe it was already implemented by someone :-)

 On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:

 HTable.close does very little:

  public void close() throws IOException{
    flushCommits();
  }


 None of which involves row locks.

 One thing to watch out for is to remember to close your scanners -
 they continue to use server-side resources until you close them or 60
 seconds passes and they get timed out.  Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.  There was a proposal to remove them, but I don't
 think that went through.


 On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
 
  On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
 
 
 
  Thanks for the response.
  (You don't need to include the cc ...)
 
  With respect to the row level locking ...
  I was interested in when the lock is actually acquired, how long the lock
 persists and when is the lock released.
  From your response, the lock is only held on updating the row, and while
 the data is being written to the memory cache which is then written to disk.
 (Note: This row level locking different than transactional row level
 locking.)
 
  Now that I've had some caffeine I think I can clarify... :-)
 
  Some of my developers complained that they were having trouble with two
 different processes trying to update the same table.
  Not sure why they were having the problem, so I wanted to have a good
 fix. The simple fix was to have them issue the close() the HTable connection
 which forces any resources that they acquired to be released.
 
 
  It would help to know what the exact problem was. Normally I wouldn't see
 any problems.
 
 
  In looking at the problem... its possible that they didn't have AutoFlush
 set to true so the write was still in the buffer and hadn't gotten flushed.
 
  If the lock only persists for the duration of the write to memory and is
 then released, then the issue could have been that the record written was in
 the buffer and not yet flushed to disk.
 
 
  At the region server level HBase will use the cache for both reads and
 writes. This happens transparently for the user. Once something is written
 in the cache, all other clients will read from the same cache. No need to
 worry if the cache has been flushed.
  Lars George has a good article about the hbase storage architecture
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  I'm also assuming that when you run a scan() against a region that any
 information written to buffer but not yet written to disk will be missed.
 
 
  When you do puts into hbase you'll use HTable. The HTable instance is on
 the client.  HTable keeps a buffer as well and if autoFlush is false it only
 flushes when you do flushCommits() or when it reaches the buffer limit, or
 when you close the table. With autoFlush set to true it will flush for every
 put.
  This buffer is on the client. So when data is actually flushed it gets on
 the region server where it will get in the region server cache and WAL.
  Unless a client flushes the put no other client can see the data because
 it still resides on the client only. Depending on what you need to do you
 can use autoFlush true if you are doing many small writes that need to be
 seen immediately by others. You can use autoFlush false and issue
 flushCommits() yourself, or you can rely on the buffer limit for that.
 
  So I guess the question isn't so much the issue of a lock, but that we
 need to make sure that data written to the buffer should be flushed ASAP
 unless we know that we're going to be writing a lot of data in the m/r job.
 
 
  Usually when you write from the reducer (heavy) is better to use a buffer
 and not autoFlush to have a good performance.
 
  Cosmin
 
 
  Thx
 
  -Mike
 
 
 
  From: cleh...@adobe.commailto:cleh...@adobe.com
  To: user@hbase.apache.orgmailto:user@hbase.apache.org
  CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
  Date: Fri, 16 Jul 2010 12:34:36 +0100
  Subject: Re: Row level locking?
 
  Currently a row is part of a region and there's a single region server
 serving that region at a particular moment.
  So when that row is updated a lock is acquired for that row until the
 actual data is updated in memory (note that a put will be written to cache
 on the region server and also

Re: Row level locking?

2010-07-16 Thread Guilherme Germoglio

thanks Ryan! (I was about to look for performance numbers)

Just another question -- slightly related to locks. Will HBase 0.90
include HTable.checkAndPut receiving more than one value to check? I'm
eager to help, if possible.

On Fri, Jul 16, 2010 at 5:58 PM, Guilherme Germoglio
germog...@gmail.com wrote:

 thanks Ryan! (I was about to look for performance numbers)
 Just another question -- slightly related to locks. Will HBase 0.90 include 
 HTable.checkAndPut receiving more than one value to check? I'm eager to help, 
 if possible.
 On Fri, Jul 16, 2010 at 5:24 PM, Ryan Rawson ryano...@gmail.com wrote:

 Explicit locks with zookeeper would be (a) slow and (b) completely out
 of band and ultimately up to you.  I wouldn't exactly be eager to do
 our row locking in zookeeper (since the minimum operation time is
 between 2-10ms).

 You could do application advisory locks, but that is true no matter
 what datastore you use...

 On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
 germog...@gmail.com wrote:
  What about implementing explicit row locks using the zookeeper? I'm 
  planning
  to do this sometime in the near future. Does anyone have any comments
  against this approach?
 
  (or maybe it was already implemented by someone :-)
 
  On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:
 
  HTable.close does very little:
 
   public void close() throws IOException{
     flushCommits();
   }
 
 
  None of which involves row locks.
 
  One thing to watch out for is to remember to close your scanners -
  they continue to use server-side resources until you close them or 60
  seconds passes and they get timed out.  Also be very wary of using any
  of the explicit row locking calls, they are generally trouble for more
  or less everyone.  There was a proposal to remove them, but I don't
  think that went through.
 
 
  On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
  
   On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
  
  
  
   Thanks for the response.
   (You don't need to include the cc ...)
  
   With respect to the row level locking ...
   I was interested in when the lock is actually acquired, how long the 
   lock
  persists and when is the lock released.
   From your response, the lock is only held on updating the row, and while
  the data is being written to the memory cache which is then written to 
  disk.
  (Note: This row level locking different than transactional row level
  locking.)
  
   Now that I've had some caffeine I think I can clarify... :-)
  
   Some of my developers complained that they were having trouble with two
  different processes trying to update the same table.
   Not sure why they were having the problem, so I wanted to have a good
  fix. The simple fix was to have them issue the close() the HTable 
  connection
  which forces any resources that they acquired to be released.
  
  
   It would help to know what the exact problem was. Normally I wouldn't 
   see
  any problems.
  
  
   In looking at the problem... its possible that they didn't have 
   AutoFlush
  set to true so the write was still in the buffer and hadn't gotten 
  flushed.
  
   If the lock only persists for the duration of the write to memory and is
  then released, then the issue could have been that the record written was 
  in
  the buffer and not yet flushed to disk.
  
  
   At the region server level HBase will use the cache for both reads and
  writes. This happens transparently for the user. Once something is written
  in the cache, all other clients will read from the same cache. No need to
  worry if the cache has been flushed.
   Lars George has a good article about the hbase storage architecture
  http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  
   I'm also assuming that when you run a scan() against a region that any
  information written to buffer but not yet written to disk will be missed.
  
  
   When you do puts into hbase you'll use HTable. The HTable instance is on
  the client.  HTable keeps a buffer as well and if autoFlush is false it 
  only
  flushes when you do flushCommits() or when it reaches the buffer limit, or
  when you close the table. With autoFlush set to true it will flush for 
  every
  put.
   This buffer is on the client. So when data is actually flushed it gets 
   on
  the region server where it will get in the region server cache and WAL.
   Unless a client flushes the put no other client can see the data because
  it still resides on the client only. Depending on what you need to do you
  can use autoFlush true if you are doing many small writes that need to be
  seen immediately by others. You can use autoFlush false and issue
  flushCommits() yourself, or you can rely on the buffer limit for that.
  
   So I guess the question isn't so much the issue of a lock, but that we
  need to make sure that data written to the buffer should be flushed ASAP
  unless we know that we're going

Re: Row level locking?

2010-07-16 Thread Stack

On Fri, Jul 16, 2010 at 2:01 PM, Guilherme Germoglio
germog...@gmail.com wrote:
 Just another question -- slightly related to locks. Will HBase 0.90
 include HTable.checkAndPut receiving more than one value to check? I'm
 eager to help, if possible.


I don't think there is even an issue to add that facility Guilherme.
Make one, stick up a patch and we'll add it.

Good on you,
St.Ack

Re: Row level locking?

2010-07-16 Thread Ryan Rawson

In the uncontended case this is fine, although you are doing 4 RPCs to
accomplish what could be done in 1 (with CAS).

But in the contended case, all the people waiting on that lock consume
RPC handler threads eventually causing a temporary deadlock since the
original lockholder will not be able to progress to release the lock.
The 60 second release will kick in and things might flow again for a
bit.



On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohen justin.co...@teamaol.com wrote:
 What kind of trouble?  I do quite a bit of:

    l = lock(row);
    val = get(row);
    /* modify val */
    put(row, val);
    unlock(l);

 Is there an alternative?

 -justin

 On 7/16/10 4:02 PM, Ryan Rawson wrote:

 Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.

Re: Row level locking?

2010-07-16 Thread Justin Cohen

In that case it would be 2 RPC, right?  do { get, update, checkAndPut } 
while (ret = false)?  Plus 2 for each contention?


Thanks,
-justin

On 7/16/10 5:09 PM, Ryan Rawson wrote:

In the uncontended case this is fine, although you are doing 4 RPCs to
accomplish what could be done in 1 (with CAS).

But in the contended case, all the people waiting on that lock consume
RPC handler threads eventually causing a temporary deadlock since the
original lockholder will not be able to progress to release the lock.
The 60 second release will kick in and things might flow again for a
bit.



On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohenjustin.co...@teamaol.com  wrote:
   

What kind of trouble?  I do quite a bit of:

l = lock(row);
val = get(row);
/* modify val */
put(row, val);
unlock(l);

Is there an alternative?

-justin

On 7/16/10 4:02 PM, Ryan Rawson wrote:
 

Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.

Re: Row level locking?

2010-07-16 Thread Patrick Hunt

Fine grain locking is not a good use case for ZooKeeper given it's 
quorum based architecture.


Patrick

On 07/16/2010 01:24 PM, Ryan Rawson wrote:

Explicit locks with zookeeper would be (a) slow and (b) completely out
of band and ultimately up to you.  I wouldn't exactly be eager to do
our row locking in zookeeper (since the minimum operation time is
between 2-10ms).

You could do application advisory locks, but that is true no matter
what datastore you use...

On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
germog...@gmail.com  wrote:

What about implementing explicit row locks using the zookeeper? I'm planning
to do this sometime in the near future. Does anyone have any comments
against this approach?

(or maybe it was already implemented by someone :-)

On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawsonryano...@gmail.com  wrote:


HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


None of which involves row locks.

One thing to watch out for is to remember to close your scanners -
they continue to use server-side resources until you close them or 60
seconds passes and they get timed out.  Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.  There was a proposal to remove them, but I don't
think that went through.


On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehenecleh...@adobe.com  wrote:


On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:



Thanks for the response.
(You don't need to include the cc ...)

With respect to the row level locking ...
I was interested in when the lock is actually acquired, how long the lock

persists and when is the lock released.

 From your response, the lock is only held on updating the row, and while

the data is being written to the memory cache which is then written to disk.
(Note: This row level locking different than transactional row level
locking.)


Now that I've had some caffeine I think I can clarify... :-)

Some of my developers complained that they were having trouble with two

different processes trying to update the same table.

Not sure why they were having the problem, so I wanted to have a good

fix. The simple fix was to have them issue the close() the HTable connection
which forces any resources that they acquired to be released.



It would help to know what the exact problem was. Normally I wouldn't see

any problems.



In looking at the problem... its possible that they didn't have AutoFlush

set to true so the write was still in the buffer and hadn't gotten flushed.


If the lock only persists for the duration of the write to memory and is

then released, then the issue could have been that the record written was in
the buffer and not yet flushed to disk.



At the region server level HBase will use the cache for both reads and

writes. This happens transparently for the user. Once something is written
in the cache, all other clients will read from the same cache. No need to
worry if the cache has been flushed.

Lars George has a good article about the hbase storage architecture

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html


I'm also assuming that when you run a scan() against a region that any

information written to buffer but not yet written to disk will be missed.



When you do puts into hbase you'll use HTable. The HTable instance is on

the client.  HTable keeps a buffer as well and if autoFlush is false it only
flushes when you do flushCommits() or when it reaches the buffer limit, or
when you close the table. With autoFlush set to true it will flush for every
put.

This buffer is on the client. So when data is actually flushed it gets on

the region server where it will get in the region server cache and WAL.

Unless a client flushes the put no other client can see the data because

it still resides on the client only. Depending on what you need to do you
can use autoFlush true if you are doing many small writes that need to be
seen immediately by others. You can use autoFlush false and issue
flushCommits() yourself, or you can rely on the buffer limit for that.


So I guess the question isn't so much the issue of a lock, but that we

need to make sure that data written to the buffer should be flushed ASAP
unless we know that we're going to be writing a lot of data in the m/r job.



Usually when you write from the reducer (heavy) is better to use a buffer

and not autoFlush to have a good performance.


Cosmin


Thx

-Mike



From: cleh...@adobe.commailto:cleh...@adobe.com
To: user@hbase.apache.orgmailto:user@hbase.apache.org
CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
Date: Fri, 16 Jul 2010 12:34:36 +0100
Subject: Re: Row level locking?

Currently a row is part of a region and there's a single region server

serving that region at a particular moment.

So when that row is updated a lock is acquired for that row until the

actual data is updated in memory (note that a put

Re: Row level locking?

RE: Row level locking?

Row level locking?

Re: Row level locking?

RE: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

Re: Row level locking?

13 matches

Site Navigation

Mail list logo

Footer information