Potential bugs in HTable In incrementColumnValue method

2015-06-09 Thread Jerry Lam
Hi HBase community,

Can anyone confirm that the method incrementColumnValue is implemented
correctly?

I'm talking about mainly the deprecated method:

 @Deprecated

  @Override

  public long incrementColumnValue(final byte [] row, final byte [] family,

  final byte [] qualifier, final long amount, final boolean writeToWAL)

  throws IOException {

return incrementColumnValue(row, family, qualifier, amount,

  writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT);

  }


Note from the above, if writeToWAL is true, Durability is set to SKIP_WAL.

It does not make sense to me so I'm asking if this might be a potential bug.


Best Regards,


Jerry


Re: Potential bugs in HTable In incrementColumnValue method

2015-06-09 Thread Jerry Lam
Hi Vlad,

I copied the code from HBase version 1.0.0.
I first noticed it in version 0.98.6.

We have codes that use HBase since 0.92. So some of the codes have not been
ported to the latest version therefore they are still using the deprecated
methods.

The reason I'm asking is because I don't know if I should use SKIP_WAL to
get the same semantic of writeToWAL (true). I'm doubting it because the
name SKIP_WAL implies writeToWAL false. :)

Best Regards,

Jerry



On Tue, Jun 9, 2015 at 12:03 PM, Ted Yu yuzhih...@gmail.com wrote:

 I see code in this formation in 0.98 branch.

 Looking at the unit tests which exercise incrementColumnValue(), they all
 call:
   public long incrementColumnValue(final byte [] row, final byte [] family,
   final byte [] qualifier, final long amount)
 Possibly because the one mentioned by Jerry is deprecated.

 FYI

 On Tue, Jun 9, 2015 at 8:49 AM, Vladimir Rodionov vladrodio...@gmail.com
 wrote:

  Hi, Jerry
 
  Which version of HBase is it?
 
  -Vlad
 
  On Tue, Jun 9, 2015 at 8:05 AM, Jerry Lam chiling...@gmail.com wrote:
 
   Hi HBase community,
  
   Can anyone confirm that the method incrementColumnValue is implemented
   correctly?
  
   I'm talking about mainly the deprecated method:
  
@Deprecated
  
 @Override
  
 public long incrementColumnValue(final byte [] row, final byte []
  family,
  
 final byte [] qualifier, final long amount, final boolean
  writeToWAL)
  
 throws IOException {
  
   return incrementColumnValue(row, family, qualifier, amount,
  
 writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT);
  
 }
  
  
   Note from the above, if writeToWAL is true, Durability is set to
  SKIP_WAL.
  
   It does not make sense to me so I'm asking if this might be a potential
   bug.
  
  
   Best Regards,
  
  
   Jerry
  
 



Re: Potential bugs in HTable In incrementColumnValue method

2015-06-09 Thread Jerry Lam
Done. Thanks everyone for confirming this!

HBASE-13881
https://issues.apache.org/jira/browse/HBASE-13881

On Tue, Jun 9, 2015 at 7:09 PM, Ted Yu yuzhih...@gmail.com wrote:

 Did a quick search in HBase JIRA - no hit.

 Jerry:
 Mind logging one ?

 Thanks

 On Tue, Jun 9, 2015 at 3:30 PM, Andrew Purtell apurt...@apache.org
 wrote:

  Is there a JIRA for this?
 
  On Tue, Jun 9, 2015 at 11:15 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Seems a bug to me w.r.t. interpretation of writeToWAL
  
   Cheers
  
   On Tue, Jun 9, 2015 at 10:50 AM, Jerry Lam chiling...@gmail.com
 wrote:
  
Hi Vlad,
   
I copied the code from HBase version 1.0.0.
I first noticed it in version 0.98.6.
   
We have codes that use HBase since 0.92. So some of the codes have
 not
   been
ported to the latest version therefore they are still using the
   deprecated
methods.
   
The reason I'm asking is because I don't know if I should use
 SKIP_WAL
  to
get the same semantic of writeToWAL (true). I'm doubting it because
 the
name SKIP_WAL implies writeToWAL false. :)
   
Best Regards,
   
Jerry
   
   
   
On Tue, Jun 9, 2015 at 12:03 PM, Ted Yu yuzhih...@gmail.com wrote:
   
 I see code in this formation in 0.98 branch.

 Looking at the unit tests which exercise incrementColumnValue(),
 they
   all
 call:
   public long incrementColumnValue(final byte [] row, final byte []
family,
   final byte [] qualifier, final long amount)
 Possibly because the one mentioned by Jerry is deprecated.

 FYI

 On Tue, Jun 9, 2015 at 8:49 AM, Vladimir Rodionov 
vladrodio...@gmail.com
 wrote:

  Hi, Jerry
 
  Which version of HBase is it?
 
  -Vlad
 
  On Tue, Jun 9, 2015 at 8:05 AM, Jerry Lam chiling...@gmail.com
wrote:
 
   Hi HBase community,
  
   Can anyone confirm that the method incrementColumnValue is
implemented
   correctly?
  
   I'm talking about mainly the deprecated method:
  
@Deprecated
  
 @Override
  
 public long incrementColumnValue(final byte [] row, final
 byte
  []
  family,
  
 final byte [] qualifier, final long amount, final boolean
  writeToWAL)
  
 throws IOException {
  
   return incrementColumnValue(row, family, qualifier, amount,
  
 writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT);
  
 }
  
  
   Note from the above, if writeToWAL is true, Durability is set
 to
  SKIP_WAL.
  
   It does not make sense to me so I'm asking if this might be a
potential
   bug.
  
  
   Best Regards,
  
  
   Jerry
  
 

   
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet Hein
  (via Tom White)
 



How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?

2014-08-20 Thread Jerry Lam
Hi HBase users,

I wonder if anyone knows how to make change to
the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?

The default value is 32 which is quite small.

HBase Version 0.98

Thank you,

Jerry


Re: How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?

2014-08-20 Thread Jerry Lam
Hi Matteo,

Thank you for the info. I tried it but it doesn't seem to take any effect.
Apparently the code in the LoadIncremtnalHFiles does not take anything
other than variables from hbase-site.xml which is unfortunate. We have more
than 32 hfiles to bulkload. So this is really not working...

Best Regards,

Jerry






On Wed, Aug 20, 2014 at 10:49 AM, Matteo Bertozzi theo.berto...@gmail.com
wrote:

 you should be able to use the -D option to set the new value

 LoadIncrementalHFiles
 -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=NEW_VALUE

 Matteo



 On Wed, Aug 20, 2014 at 3:46 PM, Jerry Lam chiling...@gmail.com wrote:

  Hi HBase users,
 
  I wonder if anyone knows how to make change to
  the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?
 
  The default value is 32 which is quite small.
 
  HBase Version 0.98
 
  Thank you,
 
  Jerry
 



Re: How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?

2014-08-20 Thread Jerry Lam
Hi Matteo,

Thank you for addressing the issue. For now, I will just set the variable
in hbase-site.xml.

Best Regards,

Jerry


On Wed, Aug 20, 2014 at 12:33 PM, Matteo Bertozzi theo.berto...@gmail.com
wrote:

 yeah sorry, just looked at the code and it is not initializing the tool
 correctly to pickup the -D configuration. let me fix that, I've opened
 HBASE-11789
 as you said with the current code only the hbase-site.xml conf is used, so
 you need to set the property there.

 Matteo



 On Wed, Aug 20, 2014 at 5:24 PM, Jerry Lam chiling...@gmail.com wrote:

  Hi Matteo,
 
  Thank you for the info. I tried it but it doesn't seem to take any
 effect.
  Apparently the code in the LoadIncremtnalHFiles does not take anything
  other than variables from hbase-site.xml which is unfortunate. We have
 more
  than 32 hfiles to bulkload. So this is really not working...
 
  Best Regards,
 
  Jerry
 
 
 
 
 
 
  On Wed, Aug 20, 2014 at 10:49 AM, Matteo Bertozzi 
 theo.berto...@gmail.com
  
  wrote:
 
   you should be able to use the -D option to set the new value
  
   LoadIncrementalHFiles
   -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=NEW_VALUE
  
   Matteo
  
  
  
   On Wed, Aug 20, 2014 at 3:46 PM, Jerry Lam chiling...@gmail.com
 wrote:
  
Hi HBase users,
   
I wonder if anyone knows how to make change to
the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?
   
The default value is 32 which is quite small.
   
HBase Version 0.98
   
Thank you,
   
Jerry
   
  
 



Re: Performance between HBaseClient scan and HFileReaderV2

2014-01-02 Thread Jerry Lam
Hi Tom,

Good point. Note that I also ran the HBaseClient performance test several
times (as you can see from the chart). The caching should also benefit the
second time I ran the HBaseClient performance test not just benefitting the
HFileReaderV2 test.

I still don't understand what makes the HBaseClient performs so poorly in
comparison to access directly HDFS. I can understand maybe a factor of 2
(even that it is too much) but a factor of 8 is quite unreasonable.

Any hint?

Jerry



On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote:

 I'm also new to HBase and am not familiar with HFileReaderV2.  However, in
 your description, you didn't mention anything about clearing the linux OS
 cache between tests.  That might be why you're seeing the big difference if
 you ran the HBaseClient test first, it may have warmed the OS cache and
 then HFileReaderV2 benefited from it.  Just a guess...

 -- Tom



 On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote:

  Hello HBase users,
 
  I just ran a very simple performance test and would like to see if what I
  experienced make sense.
 
  The experiment is as follows:
  - I filled a hbase region with 700MB data (each row has roughly 45
 columns
  and the size is 20KB for the entire row)
  - I configured the region to hold 4GB (therefore no split occurs)
  - I ran compactions after the data is loaded and make sure that there is
  only 1 region in the table under test.
  - No other table exists in the hbase cluster because this is a DEV
  environment
  - I'm using HBase 0.92.1
 
  The test is very basic. I use HBaseClient to scan the entire region to
  retrieve all rows and all columns in the table, just iterating all
 KeyValue
  pairs until it is done. It took about 1 minute 22 sec to complete. (Note
  that I disable block cache and uses caching size about 1).
 
  I ran another test using HFileReaderV2 and scan the entire region to
  retrieve all rows and all columns, just iterating all keyValue pairs
 until
  it is done. It took 11 sec.
 
  The performance difference is dramatic (almost 8 times faster using
  HFileReaderV2).
 
  I want to know why the difference is so big or I didn't configure HBase
  properly. From this experiment, HDFS can deliver the data efficiently so
 it
  is not the bottleneck.
 
  Any help is appreciated!
 
  Jerry
 
 



Re: Performance between HBaseClient scan and HFileReaderV2

2014-01-02 Thread Jerry Lam
Hello St.Ack,

I would like to switch to 0.94 but we are using 0.92.1 and we will not
change until the end of 2014. I can change the client of HBase (e.g.
AsyncHBase) if this is the bottleneck. If the problem is server side (e.g.
regionserver), are there anything I can do to improve the performance?

Best Regards,

Jerry


On Thu, Jan 2, 2014 at 11:23 AM, Stack st...@duboce.net wrote:

 On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote:

  Hello HBase users,
 
  I just ran a very simple performance test and would like to see if what I
  experienced make sense.
 
  The experiment is as follows:
  - I filled a hbase region with 700MB data (each row has roughly 45
 columns
  and the size is 20KB for the entire row)
  - I configured the region to hold 4GB (therefore no split occurs)
  - I ran compactions after the data is loaded and make sure that there is
  only 1 region in the table under test.
  - No other table exists in the hbase cluster because this is a DEV
  environment
  - I'm using HBase 0.92.1
 
 
 Can you use a 0.94?  It has had some scanner improvements.

 Thanks,
 St.Ack



  The test is very basic. I use HBaseClient to scan the entire region to
  retrieve all rows and all columns in the table, just iterating all
 KeyValue
  pairs until it is done. It took about 1 minute 22 sec to complete. (Note
  that I disable block cache and uses caching size about 1).
 
  I ran another test using HFileReaderV2 and scan the entire region to
  retrieve all rows and all columns, just iterating all keyValue pairs
 until
  it is done. It took 11 sec.
 
  The performance difference is dramatic (almost 8 times faster using
  HFileReaderV2).
 
  I want to know why the difference is so big or I didn't configure HBase
  properly. From this experiment, HDFS can deliver the data efficiently so
 it
  is not the bottleneck.
 
  Any help is appreciated!
 
  Jerry
 
 



Re: Performance between HBaseClient scan and HFileReaderV2

2014-01-02 Thread Jerry Lam
Hello Vladimir,

In my use case, I guarantee that a major compaction is executed before any
scan happens because the system we build is a read only system. There will
have no deleted cells. Additionally, I only need to read from a single
column family and therefore I don't need to access multiple HFiles.

Filter conditions are nice to have because if I can read HFile 8x faster
than using HBaseClient, I can do the filter on the client side and still
perform faster than using HBaseClient.

Thank you for your input!

Jerry



On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 HBase scanner MUST guarantee correct order of KeyValues (coming from
 different HFile's),
 filter condition+ filter condition on included column families and
 qualifiers, time range, max versions and correctly process deleted cells.
 Direct HFileReader does nothing from the above list.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Jerry Lam [chiling...@gmail.com]
 Sent: Thursday, January 02, 2014 7:56 AM
 To: user
 Subject: Re: Performance between HBaseClient scan and HFileReaderV2

 Hi Tom,

 Good point. Note that I also ran the HBaseClient performance test several
 times (as you can see from the chart). The caching should also benefit the
 second time I ran the HBaseClient performance test not just benefitting the
 HFileReaderV2 test.

 I still don't understand what makes the HBaseClient performs so poorly in
 comparison to access directly HDFS. I can understand maybe a factor of 2
 (even that it is too much) but a factor of 8 is quite unreasonable.

 Any hint?

 Jerry



 On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote:

  I'm also new to HBase and am not familiar with HFileReaderV2.  However,
 in
  your description, you didn't mention anything about clearing the linux OS
  cache between tests.  That might be why you're seeing the big difference
 if
  you ran the HBaseClient test first, it may have warmed the OS cache and
  then HFileReaderV2 benefited from it.  Just a guess...
 
  -- Tom
 
 
 
  On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com
 wrote:
 
   Hello HBase users,
  
   I just ran a very simple performance test and would like to see if
 what I
   experienced make sense.
  
   The experiment is as follows:
   - I filled a hbase region with 700MB data (each row has roughly 45
  columns
   and the size is 20KB for the entire row)
   - I configured the region to hold 4GB (therefore no split occurs)
   - I ran compactions after the data is loaded and make sure that there
 is
   only 1 region in the table under test.
   - No other table exists in the hbase cluster because this is a DEV
   environment
   - I'm using HBase 0.92.1
  
   The test is very basic. I use HBaseClient to scan the entire region to
   retrieve all rows and all columns in the table, just iterating all
  KeyValue
   pairs until it is done. It took about 1 minute 22 sec to complete.
 (Note
   that I disable block cache and uses caching size about 1).
  
   I ran another test using HFileReaderV2 and scan the entire region to
   retrieve all rows and all columns, just iterating all keyValue pairs
  until
   it is done. It took 11 sec.
  
   The performance difference is dramatic (almost 8 times faster using
   HFileReaderV2).
  
   I want to know why the difference is so big or I didn't configure HBase
   properly. From this experiment, HDFS can deliver the data efficiently
 so
  it
   is not the bottleneck.
  
   Any help is appreciated!
  
   Jerry
  
  
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



Re: Performance between HBaseClient scan and HFileReaderV2

2014-01-02 Thread Jerry Lam
Hello Sergey and Enis,

Thank you for the pointer! HBASE-8691 will definitely help. HBASE-10076
(Very interesting/exciting feature by the way!) is what I need. How can I
port it to 0.92.x if it is at all possible?

I understand that my test is not realistic however since I have only 1
region with 1 HFile (this is by design), so there should not have any
merge sorted read going on.

One thing I'm not sure is that since I use snappy compression, does the
value of the KeyValue is decompress at the region server? If yes, I think
it is quite inefficient because the decompression can be done at the client
side. Saving bandwidth saves a lot of time for the type of workload I'm
working on.

Best Regards,

Jerry



On Thu, Jan 2, 2014 at 5:02 PM, Enis Söztutar e...@apache.org wrote:

 Nice test!

 There is a couple of things here:

  (1) HFileReader reads only one file, versus, an HRegion reads multiple
 files (into the KeyValueHeap) to do a merge scan. So, although there is
 only one file, there is some overehead of doing a merge sort'ed read from
 multiple files in the region. For a more realistic test, you can try to do
 the reads using HRegion directly (instead of HFileReader). The overhead is
 not that much though in my tests.
  (2) For scanning with client API, the results have to be serialized and
 deserialized and send over the network (or loopback for local). This is
 another overhead that is not there in HfileReader.
  (3) HBase scanner RPC implementation is NOT streaming. The RPC works like
 fetching batch size (1) records, and cannot fully saturate the disk and
 network pipeline.

 In my tests for MapReduce over snapshot files (HBASE-8369), I have
 measured 5x difference, because of layers (2) and (3). Please see my slides
 at http://www.slideshare.net/enissoz/mapreduce-over-snapshots

 I think we can do a much better job at (3), see HBASE-8691. However, there
 will always be some overhead, although it should not be 5-8x.

 As suggested above, in the meantime, you can take a look at the patch for
 HBASE-8369, and https://issues.apache.org/jira/browse/HBASE-10076 to see
 whether it suits your use case.

 Enis


 On Thu, Jan 2, 2014 at 1:43 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:

  Er, using MR over snapshots, which reads files directly...
  https://issues.apache.org/jira/browse/HBASE-8369
  However, it was only committed to 98.
  There was interest in 94 port (HBASE-10076), but it never happened...
 
 
  On Thu, Jan 2, 2014 at 1:42 PM, Sergey Shelukhin ser...@hortonworks.com
  wrote:
 
   You might be interested in using
   https://issues.apache.org/jira/browse/HBASE-8369
   However, it was only committed to 98.
   There was interest in 94 port (HBASE-10076), but it never happened...
  
  
   On Thu, Jan 2, 2014 at 1:32 PM, Jerry Lam chiling...@gmail.com
 wrote:
  
   Hello Vladimir,
  
   In my use case, I guarantee that a major compaction is executed before
  any
   scan happens because the system we build is a read only system. There
  will
   have no deleted cells. Additionally, I only need to read from a single
   column family and therefore I don't need to access multiple HFiles.
  
   Filter conditions are nice to have because if I can read HFile 8x
 faster
   than using HBaseClient, I can do the filter on the client side and
 still
   perform faster than using HBaseClient.
  
   Thank you for your input!
  
   Jerry
  
  
  
   On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
   vrodio...@carrieriq.comwrote:
  
HBase scanner MUST guarantee correct order of KeyValues (coming from
different HFile's),
filter condition+ filter condition on included column families and
qualifiers, time range, max versions and correctly process deleted
   cells.
Direct HFileReader does nothing from the above list.
   
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
   

From: Jerry Lam [chiling...@gmail.com]
Sent: Thursday, January 02, 2014 7:56 AM
To: user
Subject: Re: Performance between HBaseClient scan and HFileReaderV2
   
Hi Tom,
   
Good point. Note that I also ran the HBaseClient performance test
   several
times (as you can see from the chart). The caching should also
 benefit
   the
second time I ran the HBaseClient performance test not just
  benefitting
   the
HFileReaderV2 test.
   
I still don't understand what makes the HBaseClient performs so
 poorly
   in
comparison to access directly HDFS. I can understand maybe a factor
  of 2
(even that it is too much) but a factor of 8 is quite unreasonable.
   
Any hint?
   
Jerry
   
   
   
On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com
  wrote:
   
 I'm also new to HBase and am not familiar with HFileReaderV2.
However,
in
 your description, you didn't mention anything about clearing the
   linux

Re: Performance between HBaseClient scan and HFileReaderV2

2014-01-02 Thread Jerry Lam
Hello Lars,

Yes, I used setCaching for getting more KeyValues in each RPC call. Also
yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting
is enabled but I don't know how to ensure it has been used (Is there log
that can tell me if it has been used?).

I did made sure the HBaseClient runs on the same regionserver that holds
the data.

I just tried asynchbase (as I'm running out of ideas, I started to try
everything), it takes 60 seconds to scan through the data (20 seconds less
than using HBaseClient).

Best Regards,

Jerry

On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl la...@apache.org wrote:

 From the below I gather you set scanner caching (Scan.setCaching(...))?
 When you use HFileReaderV2, you're still reading from HDFS, right? Are you
 using short circuit reading (avoiding network IO)?

 In the HBaseClient client you pipe all the data through the network again.
 Is the HBaseClient located on a different machine?

 I would use a profiler (just use jVisualVM, which ships with the JDK and
 use the sampling profiler) to see where the time is spent.

 Lastly, to echo what other folks have said, 0.92 is pretty old at this
 point and I personally added a lot of performance improvements to HBase
 during the 0.94 timeframe and other's have as well.
 If you could test the same with 0.94, I'd be very interested in the
 numbers.

 -- Lars



 
  From: Jerry Lam chiling...@gmail.com
 To: user user@hbase.apache.org
 Sent: Thursday, January 2, 2014 1:32 PM
 Subject: Re: Performance between HBaseClient scan and HFileReaderV2


 Hello Vladimir,

 In my use case, I guarantee that a major compaction is executed before any
 scan happens because the system we build is a read only system. There will
 have no deleted cells. Additionally, I only need to read from a single
 column family and therefore I don't need to access multiple HFiles.

 Filter conditions are nice to have because if I can read HFile 8x faster
 than using HBaseClient, I can do the filter on the client side and still
 perform faster than using HBaseClient.

 Thank you for your input!

 Jerry




 On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov
 vrodio...@carrieriq.comwrote:

  HBase scanner MUST guarantee correct order of KeyValues (coming from
  different HFile's),
  filter condition+ filter condition on included column families and
  qualifiers, time range, max versions and correctly process deleted cells.
  Direct HFileReader does nothing from the above list.
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Jerry Lam [chiling...@gmail.com]
  Sent: Thursday, January 02, 2014 7:56 AM
  To: user
  Subject: Re: Performance between HBaseClient scan and HFileReaderV2
 
  Hi Tom,
 
  Good point. Note that I also ran the HBaseClient performance test several
  times (as you can see from the chart). The caching should also benefit
 the
  second time I ran the HBaseClient performance test not just benefitting
 the
  HFileReaderV2 test.
 
  I still don't understand what makes the HBaseClient performs so poorly in
  comparison to access directly HDFS. I can understand maybe a factor of 2
  (even that it is too much) but a factor of 8 is quite unreasonable.
 
  Any hint?
 
  Jerry
 
 
 
  On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote:
 
   I'm also new to HBase and am not familiar with HFileReaderV2.  However,
  in
   your description, you didn't mention anything about clearing the linux
 OS
   cache between tests.  That might be why you're seeing the big
 difference
  if
   you ran the HBaseClient test first, it may have warmed the OS cache and
   then HFileReaderV2 benefited from it.  Just a guess...
  
   -- Tom
  
  
  
   On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com
  wrote:
  
Hello HBase users,
   
I just ran a very simple performance test and would like to see if
  what I
experienced make sense.
   
The experiment is as follows:
- I filled a hbase region with 700MB data (each row has roughly 45
   columns
and the size is 20KB for the entire row)
- I configured the region to hold 4GB (therefore no split occurs)
- I ran compactions after the data is loaded and make sure that there
  is
only 1 region in the table under test.
- No other table exists in the hbase cluster because this is a DEV
environment
- I'm using HBase 0.92.1
   
The test is very basic. I use HBaseClient to scan the entire region
 to
retrieve all rows and all columns in the table, just iterating all
   KeyValue
pairs until it is done. It took about 1 minute 22 sec to complete.
  (Note
that I disable block cache and uses caching size about 1).
   
I ran another test using HFileReaderV2 and scan the entire region to
retrieve all rows and all columns, just iterating all keyValue pairs

Performance between HBaseClient scan and HFileReaderV2

2013-12-23 Thread Jerry Lam
Hello HBase users,

I just ran a very simple performance test and would like to see if what I
experienced make sense.

The experiment is as follows:
- I filled a hbase region with 700MB data (each row has roughly 45 columns
and the size is 20KB for the entire row)
- I configured the region to hold 4GB (therefore no split occurs)
- I ran compactions after the data is loaded and make sure that there is
only 1 region in the table under test.
- No other table exists in the hbase cluster because this is a DEV
environment
- I'm using HBase 0.92.1

The test is very basic. I use HBaseClient to scan the entire region to
retrieve all rows and all columns in the table, just iterating all KeyValue
pairs until it is done. It took about 1 minute 22 sec to complete. (Note
that I disable block cache and uses caching size about 1).

I ran another test using HFileReaderV2 and scan the entire region to
retrieve all rows and all columns, just iterating all keyValue pairs until
it is done. It took 11 sec.

The performance difference is dramatic (almost 8 times faster using
HFileReaderV2).

I want to know why the difference is so big or I didn't configure HBase
properly. From this experiment, HDFS can deliver the data efficiently so it
is not the bottleneck.

Any help is appreciated!

Jerry


Re: Nosqls schema design

2012-11-08 Thread Jerry Lam
Hi Nick:

Your question is a good and tough one. I haven't find anything that helps
in guiding the schema design in the nosql world. There are general concepts
but none of them is closed to the SQL schema design in which you can apply
some rules to guiding your decision.

The best presentation I have found about the general concepts in hbase
schema design is
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html
and
search for Schema Design. From this presentation, you can learn why it is
so difficult to come up with a suggestion for your problem and learn some
best practices to start your own design.

HTH,

Jerry


On Thu, Nov 8, 2012 at 10:17 AM, Nick maillard 
nicolas.maill...@fifty-five.com wrote:

 Thanks for the anwsers.

 I'm trying to really make sense of NoSql and Hbase in particular. The
 software
 part has a lot of loop wholes and I'm still fighting off the compaction
 storm
 issue, so right I would not say hbase is fast when it comes to writing.

 But my post was more nosql schema thoughts, after so long on SQL schemas it
 does take a little time to stop thinking that way in terms of schema but
 also of
 in terms of questions or of interaction if you'd rather.
 So contrary to SQL I cannot think a logical model for data and figure out
 later
 what I'll want out of it.

 In my case I stated 10 TB but this is very likely to grow since it is the
 starting scenario. I do believe having a 30 minutes latency before
 ingesting
 logs is not an issue, however the questions to the Hbase must be anwsered
 in
 real time manner.

 I have been trying to play with my questions and see how they can fit in a
 rowkey and Or columnfamilies but they being different in nature and
 purpose I
 ended supposing they would end up in a number of different hbase tables in
 order to adress the scope of questions. One table for one or three
 questions.
 The questions have joins and filter embedded in them.

 My post was about getting your insight on how you would go about answering
 this
 type of issues, what your schemas might be. Overall how to switch from SQL
 vision to noSQL vision.
 Coprocessor to create a couple of tables on the fly for all questions are
 an
 interesting way. To mapreduce the logs however I am afraid the performance
 would
 be to slow. I was thinking of answering in milliseconds if possible. But
 this
 might be me being new and not evaluating correctly.







Re: How to check if a major_compact is done?

2012-11-08 Thread Jerry Lam
Hi Yun:

Please refer to HBase Metric:
http://hbase.apache.org/book/hbase_metrics.html
The hbase.regionserver.compactionQueueSize seems promising but I'm not
certain because I have never use it.

Best Regards,

Jerry


On Thu, Nov 8, 2012 at 6:43 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Please someone correct me if I'm wrong, but I think there is some
 information exposed to JMX which give you the duration (and size) of
 the last compaction.

 JM

 2012/11/8, PG pengyunm...@gmail.com:
  Hi, thanks for the comments. One thing is shouldn't web UI comes from
  the hbase API, or can I issue function call to get the progress of
  compaction?.
  Hun
 
  On Nov 8, 2012, at 1:33 AM, ramkrishna vasudevan
  ramkrishna.s.vasude...@gmail.com wrote:
 
  There is no interface which says that the major compaction is completed.
  But you can see that major compaction is in progress from the web UI.
  Sorry if am wrong here.
 
  Regards
  Ram
 
  On Thu, Nov 8, 2012 at 11:38 AM, yun peng pengyunm...@gmail.com
 wrote:
 
  Hi, All,
  I want to measure the duration of a major compaction in HBase. Since
 the
  function call majorCompact is asynchronous, I may need to manually
 check
  when the major compaction is done. Does Hbase (as of version 0.92.4)
  provide an interface to determine completion of major compaction?
  Thanks.
  Yun
 
 



Re: Is HBaseAdmin thread-safe?

2012-11-08 Thread Jerry Lam
Hi HBase users:

I look at the code, hbaseadmin depends on HConnection. Is
HConnectionImplementation thread-safe?

Best Regards,

Jerry


On Wed, Nov 7, 2012 at 6:21 PM, Jerry Lam chiling...@gmail.com wrote:

 Hi HBase users:

 Is HBaseAdmin thread-safe?

 Best Regards,

 Jerry



Re: Best technique for doing lookup with Secondary Index

2012-10-26 Thread Jerry Lam
Can we enforce 2 regions to collocate together as a logical group?

On Fri, Oct 26, 2012 at 6:14 AM, fding hbase fding.hb...@gmail.com wrote:

 https://github.com/danix800/hbase-indexed

 On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:

   AFAIK, RPC cannot be avoided even if Region A and Region B are on same
   RS
   since these two regions are from different table. Am i right?
 
  No... suppose your Region A and Region B of different tables are
 collocated
  on same RS then from the coprocessor environment variable you can get
  access
  to the RS.
  From RS you can get the online regions and from that region object you
 can
  call puts or gets.  This will not involve any RPC with in that RS because
  we
  only deal with Region objects.
 
  Regards
  Ram
 
   -Original Message-
   From: anil gupta [mailto:anilgupt...@gmail.com]
   Sent: Friday, October 26, 2012 12:17 PM
   To: user@hbase.apache.org
   Subject: Re: Best technique for doing lookup with Secondary Index
  
   
Now your main question is lookups right
Now there are some more hooks in the scan flow called
   pre/postScannerOpen,
pre/postScannerNext.
May be you can try using them to do a look up on the secondary table
   and
then use those values and pass it to the main table next().
   
  
   In secondary index its hard to avoid at-least two RPC calls(1 from
   client
   to table B and then from table B to Table A) whether you use coproc or
   not.
   But, i believe using coproc is better than doing RPC calls from client
   since it might be outside the subnet/network of cluster. In this case,
   the
   RPC will be faster when we use coprocs. In my case the client is
   certainly
   not in the same subnet or network zone. I need to provide results of
   query
   in around 100 milliseconds or less so i need to be really frugal. Let
   me
   know your views on this.
  
   Have you implemented queries with Secondary indexes using coproc yet?
   At present i have tried the client side query and i can get the results
   of
   query in around 100 ms. I am enticed to try out the coproc
   implementation.
  
   But this may involve more RPC calls as your regions of A and B may
   be in
different RS.
   
   AFAIK, RPC cannot be avoided even if Region A and Region B are on same
   RS
   since these two regions are from different table. Am i right?
  
  
   Thanks,
   Anil Gupta
  
   On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
 Is it a
 good idea to create Htable instance on B and do put in my mapper?
   I
 might
 try this idea.
Yes you can do this..  May be the same mapper you can do a put for
   table
B.  This was how we have tried loading data to another table by
   using the
main table A
Puts.
   
Now your main question is lookups right
Now there are some more hooks in the scan flow called
   pre/postScannerOpen,
pre/postScannerNext.
May be you can try using them to do a look up on the secondary table
   and
then use those values and pass it to the main table next().
But this may involve more RPC calls as your regions of A and B
   may be
in
different RS.
   
If something is wrong in my understanding of what you said, kindly
   spare
me.
:)
   
Regards
Ram
   
   
 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Friday, October 26, 2012 3:40 AM
 To: user@hbase.apache.org
 Subject: Re: Best technique for doing lookup with Secondary Index

 Anoop:  In prePut hook u call HTable#put()?
 Anil: Yes i call HTable#put() in prePut. Is there better way of
   doing
 it?

 Anoop: Why use the network calls from server side here then?
 Anil: I thought this is a cleaner approach since i am using
   BulkLoader.
 I
 decided not to run two jobs since i am generating a
   UniqueIdentifier at
 runtime in bulkloader.

 Anoop: can not handle it from client alone?
 Anil: I cannot handle it from client since i am using BulkLoader.
   Is it
 a
 good idea to create Htable instance on B and do put in my mapper?
   I
 might
 try this idea.

 Anoop: You can have a look at Lily project.
 Anil: It's little late for us to evaluate Lily now and at present
   we
 dont
 need complex secondary index since our data is immutable.

 Ram: what is rowkey B here?
 Anil: Suppose i am storing customer events in table A. I have two
 requirement for data query:
 1. Query customer events on basis of customer_Id and event_ID.
 2. Query customer events on basis of event_timestamp and
   customer_ID.

 70% of querying is done by query#1, so i will create
 customer_Idevent_ID as row key of Table A.
 Now, in order to support fast results for query#2, i need to create
   a
 secondary index on A. I 

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Jerry Lam
Hi JM:

There was a thread discussing M/R bulk delete vs. Coprocessor bulk delete.
The thread subject is Bulk Delete.
The guy in that post suggested to write a HFile which contains all the
delete markers and then use bulk incremental load facility to actually move
all the delete markers to the regions at once. This strategy works for my
use case too because my M/R job generates a lot of version delete markers.

You might take a look on that thread for additional ways to delete data
from hbase.

Best Regards,

Jerry


On Thu, Oct 25, 2012 at 1:13 PM, Anoop John anoop.hb...@gmail.com wrote:

 What I still don’t understand is, since both CP and MR are both
 running on the region side, with is the MR better than the CP?
 For the case bulk delete alone CP (Endpoint) will be better than MR for
 sure..  Considering your over all need people were suggesting better MR..
 U need a scan and move some data into another table too...
 Both MR and CP run on the region side ???  - Well there is difference. The
 CP run within your RS process itself.. So that is why bulk delete using
 Endpoint is efficient..  It is a local read and delete. No n/w calls
 involved at all..  But in case of MR even if the mappers run on the same
 machine as that of the region it is a inter process communication..
 Hope I explained you the diff well...

 -Anoop-

 On Thu, Oct 25, 2012 at 6:31 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi all,
 
  First, sorry about my slowness to reply to this thread, but it went to
  my spam folder and I lost sight of it.
 
  I don’t have good knowledge of RDBMS, and so I don’t have good
  knowledge of triggers too. That’s why I looked at the endpoints too
  because they are pretty new for me.
 
  First, I can’t really use multiple tables. I have one process writing
  to this table barely real-time. Another one is deleting from this
  table too. But some rows are never deleted. They are timing out, and
  need to be moved by the process I’m building here.
 
  I was not aware of the possibility to setup the priority for an MR job
  (any link to show how?). That’s something I will dig into. I was a bit
  scared about the network load if I’m doing deletes lines by lines and
  not bulk.
 
  What I still don’t understand is, since both CP and MR are both
  running on the region side, with is the MR better than the CP? Because
  the hadoop framework is taking care of it and will guarantee that it
  will run on all the regions?
 
  Also, is there some sort of “pre” and “post” methods I can override
  for MR jobs to initially list of puts/deletes and submit them at the
  end? Or should I do that one by one on the map method?
 
  Thanks,
 
  JM
 
 
  2012/10/18, lohit lohit.vijayar...@gmail.com:
   I might be little off here. If rows are moved to another table on
 weekly
  or
   daily basis, why not create per weekly or per day table.
   That way you need to copy and delete. Of course it will not work you
 are
   are selectively filtering between timestamps and clients have to have
   notion of multiple tables.
  
   2012/10/18 Anoop Sam John anoo...@huawei.com
  
   A CP and Endpoints operates at a region level.. Any operation within
 one
   region we can perform using this..  I have seen in below use case that
   along with the delete there was a need for inserting data to some
 other
   table also.. Also this was kind of a periodic action.. I really doubt
  how
   the endpoints alone can be used here.. I also tend towards the MR..
  
 The idea behind the bulk delete CP is simple.  We have a use case of
   deleting a bulk of rows and this need to be online delete. I also have
   seen
   in the mailing list many people ask question regarding that... In all
   people were using scans and get the rowkeys to the client side and
 then
   doing the deletes..  Yes most of the time complaint was the slowness..
   One
   bulk delete performance improvement was done in HBASE-6284..  Still
   thought
   we can do all the operation (scan+delete) in server side and we can
 make
   use of the endpoints here.. This will be much more faster and can be
  used
   for online bulk deletes..
  
   -Anoop-
  
   
   From: Michael Segel [michael_se...@hotmail.com]
   Sent: Thursday, October 18, 2012 11:31 PM
   To: user@hbase.apache.org
   Subject: Re: Coprocessor end point vs MapReduce?
  
   Doug,
  
   One thing that concerns me is that a lot of folks are gravitating to
   Coprocessors and may be using them for the wrong thing.
   Has anyone done any sort of research as to some of the limitations and
   negative impacts on using coprocessors?
  
   While I haven't really toyed with the idea of bulk deletes, periodic
   deletes is probably not a good use of coprocessors however using
  them
   to synchronize tables would be a valid use case.
  
   Thx
  
   -Mike
  
   On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com
 
   wrote:
  
   
To echo what 

Question on Scanner REST API Usage

2012-10-19 Thread Jerry Lam
Hi HBase community:

I have a few questions on the usage of Scanner via REST API:

- From the XML schema (
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#xmlschema),
we can set the maximum number of values to return for each call to next()
by specifying the batch attribute. Is there a way to set the number of rows
for caching that will be passed to scanners (setCaching)?

- Also, I cannot find a way to get all columns of a single row for each
call to next(). Can someone tell me if this is possible? Note that setting
the batch size won't work because, for example, some rows might have 10
columns and the other rows might have 5 columns, setting batch to 10 will
include cells that are from other rows. I want a API that behaves like the
Java native API that will get all columns of a row when I call next().

Any help is greatly appreciated.

Best Regards,

Jerry


Re: Using filters in REST/stargate returns 204 (No content)

2012-10-19 Thread Jerry Lam
Hi Suresh:

Have you tried to create a scanner without the filter? Does it return
errors as well?

Best Regards,

Jerry

On Fri, Oct 19, 2012 at 1:16 PM, Kumar, Suresh suresh.kum...@emc.comwrote:


 Here is the hbase shell command which works, I am not able to get these
 results using curl/stargate.

 scan 'apachelogs', { COLUMNS = 'mylog:pcol', FILTER =
 SingleColumnValueFilter('mylog','pcol', =, 'regexstring: ERROR
 x.') }

 Here is the curl command which does not work:

 curl -v -H Content-Type:text/xml -d @args.txt
 http://localhost:8080/apachelogs/scanner

 where args.txt:

 Scanner
 filter
 {
 latestVersion:true, ifMissing:true,
 qualifier:cGNvbAo=, family:a2V5c3RvbmVsb2cK,
 op:EQUAL, type:SingleColumnValueFilter,

 comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty
 pe:RegexStringComparator}
 }
 /filter
 /Scanner

 Thanks,
 Suresh

 -Original Message-
 From: Andrew Purtell [mailto:apurt...@apache.org]
 Sent: Thursday, October 18, 2012 1:19 PM
 To: user@hbase.apache.org
 Subject: Re: Using filters in REST/stargate returns 204 (No content)

 What does the HBase shell return if you try that scan programatically?

 On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh
 suresh.kum...@emc.comwrote:

 
 
  I have a HBase Java client which has a couple of filters and just work
  fine, I get the expected result.
 
  Here is the code:
 
 
 
  HTable table = new HTable(conf, apachelogs);
 
 
  Scan scan = new Scan();
 
 
  FilterList list = new
  FilterList(FilterList.Operator.MUST_PASS_ALL);
 
 
  RegexStringComparator comp = new
  RegexStringComparator(ERROR x.);
 
 
 
  SingleColumnValueFilter filter = new
  SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol),
 
 
  CompareOp.EQUAL, comp);
 
  filter.setFilterIfMissing(true);
 
  list.addFilter(filter);
 
  scan.setFilter(list);
 
  ResultScanner scanner = table.getScanner(scan);
 
 
 
  I startup the REST server, and use curl for the above functionality, I
  just base 64 encoded ERROR x.:
 
 
 
  curl -v -H Content-Type:text/xml -d @args.txt
  http://localhost:8080/apachelogs/scanner
 
 
 
  where args.txt is:
 
 
 
  Scanner
 
  filter
 
  {
 
  latestVersion:true, ifMissing:true,
 
  qualifier:pcol, family:mylog,
 
  op:EQUAL, type:SingleColumnValueFilter,
 
 
 
 comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty
  pe:RegexStringComparator}
 
  }
 
  /filter
 
  /Scanner
 
 
 
  which returns
 
  * About to connect() to localhost port 8080 (#0)
 
  *   Trying 127.0.0.1... connected
 
   POST /apachelogs/scanner HTTP/1.1
 
   User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
  OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
 
   Host: localhost:8080
 
   Accept: */*
 
   Content-Type:text/xml
 
   Content-Length: 318
 
  
 
  * upload completely sent off: 318out of 318 bytes
 
   HTTP/1.1 201 Created
 
   Location:
  http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6
 
   Content-Length: 0
 
  
 
  * Connection #0 to host localhost left intact
 
  * Closing connection #0
 
 
 
  but  curl -v
  http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6
 
  returns HTTP/1.1 204 No Content
 
 
 
  Any clues?
 
 
 
  Thanks,
 
  Suresh
 
 


 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)



Re: Slow scanning for PrefixFilter on EncodedBlocks

2012-10-15 Thread Jerry Lam
Hi ./zahoor:

I don't think it is the same issue.
Did you provide the Scan object with the startkey = prefix?

something like:
Scan scan = new Scan(prefix);

My understanding is that the PrefixFilter does not Seek to the key with
Prefix therefore, the Scanner basically start from the beginning of the
table and apply the Prefix filter to each key values. From this
perspective, the PrefixFilter might be improved by using Hint though..

Best Regards,

Jerry

On Mon, Oct 15, 2012 at 1:27 PM, J Mohamed Zahoor jmo...@gmail.com wrote:

 Is this related to HBASE-6757 ?
 I use a filter list with
   - prefix filter
   - filter list of column filters

 /zahoor

 On Monday, October 15, 2012, J Mohamed Zahoor wrote:

  Hi
 
  My scanner performance is very slow when using a Prefix filter on a
  **Encoded Column** ( encoded using FAST_DIFF on both memory and disk).
  I am using 94.1 hbase.
 
  jstack shows that much time is spent on seeking the row.
  Even if i give a exact row key match in the prefix filter it takes about
  two minutes to return a single row.
  Running this multiple times also seems to be redirecting things to disk
  (loadBlock).
 
 
  at
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
  at
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
   at
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
  at
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
   at
 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
  at
 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
   at
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
  - locked 0x00059584fab8 (a
  org.apache.hadoop.hbase.regionserver.StoreScanner)
   at
 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
  - locked 0x00059584fab8 (a
  org.apache.hadoop.hbase.regionserver.StoreScanner)
   at
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
   at
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
  - locked 0x00059589bb30 (a
  org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
   at
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
 
  If is set the start and end row as same row in scan ... it come in very
  quick.
 
  Saw this link
 
 http://search-hadoop.com/m/9f0JH1Kz24U1subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
  But it looks like things are fine in 94.1.
 
  Any pointers on why this is slow?
 
 
  Note: the row has not many columns(5 and less than a kb) and lots of
  versions (1500+)
 
  ./zahoor
 
 
 



Re: bulk deletes

2012-10-12 Thread Jerry Lam
Hi Anoop:

In my use case, I use extensively the version delete marker because I need
to delete a specific version of a cell (row key, CF, qualifier, timestamp).
I have a mapreduce job that will run across some regions and based on some
business rules, some of the cells will be deleted in the table using the
version delete marker. The business rules for deletion are scoped to each
column family at a time. Therefore, there are no logically dependency of
deletions between column families.

I also posted the above use case in the HBASE-6942.

Best Regards,

Jerry

On Thu, Oct 11, 2012 at 12:04 AM, Anoop Sam John anoo...@huawei.com wrote:

 You are right Jerry..
 In your use case you want to delete full rows or some cfs/columns only?
  Pls feel free to see the issue HBASE-6942 and give your valuable comments..
 Here I am trying to delete the rows [This is our use case]

 -Anoop-
 
 From: Jerry Lam [chiling...@gmail.com]
 Sent: Wednesday, October 10, 2012 8:37 PM
 To: user@hbase.apache.org
 Subject: Re: bulk deletes

 Hi guys:

 The bulk delete approaches described in this thread are helpful in my case
 as well. If I understood correctly, Paul's approach is useful for offline
 bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for
 online/real-time bulk deletes (a.k.a. co-processor)?

 Best Regards,

 Jerry

 On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles pmack...@adobe.com wrote:

  Very cool Anoop. I can definitely see how that would be useful.
 
  Lars - the bulk deletes do appear to work. I just wasn't sure if there
 was
  something I might be missing since I haven't seen this documented
  elsewhere.
 
  Coprocessors do seem a better fit for this in the long term.
 
  Thanks everyone.
 
  On 10/7/12 11:55 PM, Anoop Sam John anoo...@huawei.com wrote:
 
  We also done an implementation using compaction time deletes(avoid KVs).
  This works very well for us
  As this would delay the deletes to happen till the next major
 compaction,
  we are having an implementation to do the real time bulk delete. [We
 have
  such use case]
  Here I am using an endpoint implementation to do the scan and delete at
  the server side only. Just raised an IA for this [HBASE-6942].  I will
  post a patch based on 0.94 model there...Pls have a look  I have
  noticed big performance improvement over the normal way of  scan() +
  delete(ListDelete) as this avoids several network calls and traffic...
  
  -Anoop-
  
  From: lars hofhansl [lhofha...@yahoo.com]
  Sent: Saturday, October 06, 2012 1:09 AM
  To: user@hbase.apache.org
  Subject: Re: bulk deletes
  
  Does it work? :)
  
  How did you do the deletes before?I assume you used the
  HTable.delete(ListDelete) API?
  
  (Doesn't really help you, but) In 0.92+ you could hook up a coprocessor
  into the compactions and simply filter out any KVs you want to have
  removed.
  
  
  -- Lars
  
  
  
  
   From: Paul Mackles pmack...@adobe.com
  To: user@hbase.apache.org user@hbase.apache.org
  Sent: Friday, October 5, 2012 11:17 AM
  Subject: bulk deletes
  
  We need to do deletes pretty regularly and sometimes we could have
  hundreds of millions of cells to delete. TTLs won't work for us because
  we have a fair amount of bizlogic around the deletes.
  
  Given their current implemention  (we are on 0.90.4), this delete
 process
  can take a really long time (half a day or more with 100 or so
 concurrent
  threads). From everything I can tell, the performance issues come down
 to
  each delete being an individual RPC call (even when using the batch
 API).
  In other words, I don't see any thrashing on hbase while this process is
  running ­ just lots of waiting for the RPC calls to return.
  
  The alternative we came up with is to use the standard bulk load
  facilities to handle the deletes. The code turned out to be surpisingly
  simple and appears to work in the small-scale tests we have tried so
 far.
  Is anyone else doing deletes in  this fashion? Are there drawbacks that
 I
  might be missing? Here is a link to the code:
  
  https://gist.github.com/3841437
  
  Pretty simple, eh? I haven't seen much mention of this technique which
 is
  why I am a tad paranoid about it.
  
  Thanks,
  Paul
 
 



Re: Problem with Rest Java Client

2012-10-12 Thread Jerry Lam
Hi Erman:

I think this post summed up very well about deletion in hbase (
http://hadoop-hbase.blogspot.ca/2011/12/deletion-in-hbase.html) Please have
a look.

If you need to delete a specific version of a cell, you can use version
delete marker as described in the post.

Best Regards,

Jerry

On Tue, Oct 9, 2012 at 1:24 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Hi Erman,

 It's normal.

 At t=1 you insert val1
 At t=2 you insert val2
 At t=3 you put a marker that row1:farm1:q1 values are deleted.

 When you try to read the values, HBase will hide all that is before
 t=3 because of the marker. Which mean you will not see val2 neither
 you will see val1.

 I think you can still see them if you read ALL the version for the row.

 JM

 2012/10/9, Erman Pattuk ermanpat...@gmail.com:
  Hi,
 
  I have started using HBase Rest Java client as a part of my project. I
  see that it may have a problem with the Delete operation.
  For a given Delete object, if you apply deleteColumn(family, qualifier)
  on it, all matching qualifiers are deleted instead of the latest
  value.
 
  In order to recreate the problem:
 
  1 - Create table tab1, with family fam1.
  2 - Through shell, insert two values, as:
   row1, fam1, q1, val1
   row1, fam1, q1, val2
  3 - Through Rest Java client:
   Delete delItem = new Delete(Bytes.toBytes(row1));
   delItem.deleteColumn(Bytes.toBytes(fam1), Bytes.toBytes(q1));
   table.delete(delItem);
  4 - All q1 values are deleted, instead of the latest q1 value, which is
  val2.
 
  Is that an expected result?
 
  Thanks,
  Erman
 



Re: bulk deletes

2012-10-10 Thread Jerry Lam
Hi guys:

The bulk delete approaches described in this thread are helpful in my case
as well. If I understood correctly, Paul's approach is useful for offline
bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for
online/real-time bulk deletes (a.k.a. co-processor)?

Best Regards,

Jerry

On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles pmack...@adobe.com wrote:

 Very cool Anoop. I can definitely see how that would be useful.

 Lars - the bulk deletes do appear to work. I just wasn't sure if there was
 something I might be missing since I haven't seen this documented
 elsewhere.

 Coprocessors do seem a better fit for this in the long term.

 Thanks everyone.

 On 10/7/12 11:55 PM, Anoop Sam John anoo...@huawei.com wrote:

 We also done an implementation using compaction time deletes(avoid KVs).
 This works very well for us
 As this would delay the deletes to happen till the next major compaction,
 we are having an implementation to do the real time bulk delete. [We have
 such use case]
 Here I am using an endpoint implementation to do the scan and delete at
 the server side only. Just raised an IA for this [HBASE-6942].  I will
 post a patch based on 0.94 model there...Pls have a look  I have
 noticed big performance improvement over the normal way of  scan() +
 delete(ListDelete) as this avoids several network calls and traffic...
 
 -Anoop-
 
 From: lars hofhansl [lhofha...@yahoo.com]
 Sent: Saturday, October 06, 2012 1:09 AM
 To: user@hbase.apache.org
 Subject: Re: bulk deletes
 
 Does it work? :)
 
 How did you do the deletes before?I assume you used the
 HTable.delete(ListDelete) API?
 
 (Doesn't really help you, but) In 0.92+ you could hook up a coprocessor
 into the compactions and simply filter out any KVs you want to have
 removed.
 
 
 -- Lars
 
 
 
 
  From: Paul Mackles pmack...@adobe.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Friday, October 5, 2012 11:17 AM
 Subject: bulk deletes
 
 We need to do deletes pretty regularly and sometimes we could have
 hundreds of millions of cells to delete. TTLs won't work for us because
 we have a fair amount of bizlogic around the deletes.
 
 Given their current implemention  (we are on 0.90.4), this delete process
 can take a really long time (half a day or more with 100 or so concurrent
 threads). From everything I can tell, the performance issues come down to
 each delete being an individual RPC call (even when using the batch API).
 In other words, I don't see any thrashing on hbase while this process is
 running ­ just lots of waiting for the RPC calls to return.
 
 The alternative we came up with is to use the standard bulk load
 facilities to handle the deletes. The code turned out to be surpisingly
 simple and appears to work in the small-scale tests we have tried so far.
 Is anyone else doing deletes in  this fashion? Are there drawbacks that I
 might be missing? Here is a link to the code:
 
 https://gist.github.com/3841437
 
 Pretty simple, eh? I haven't seen much mention of this technique which is
 why I am a tad paranoid about it.
 
 Thanks,
 Paul




Re: key design

2012-10-10 Thread Jerry Lam
Hi:

So you are saying you have ~3TB of data stored per day?

Using the second approach, all data for one day will go to only 1
regionserver no matter what you do because HBase doesn't split a single
row.

Using the first approach, data will spread across regionservers but there
will be hotspotted to each regionserver during write since this is a
time-series problem.

Best Regards,

Jerry

On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio yutoo.ya...@gmail.com wrote:

 hi
 i have a question about key  column design.
 in my application we have 3,000,000,000 record in every day
 each record contain : user-id, time stamp, content(max 1KB).
 we need to store records for one year, this means we will have about
 1,000,000,000,000 after 1 year.
 we just search a user-id over rang of time stamp
 table can design in two way
 1.key=userid-timestamp and column:=content
 2.key=userid-MMdd and column:HHmmss=content


 in first design we have tall-narrow table but we have very very records, in
 second design we have flat-wide table.
 which of them have better performance?

 thanks.



Re: key design

2012-10-10 Thread Jerry Lam
That's true.Then there would be max. 86,400 records per day per userid.
That is about 100MB per day. I don't see much difference in both approaches
from the storage perspective.

On Wed, Oct 10, 2012 at 1:09 PM, Doug Meil doug.m...@explorysmedical.comwrote:

 Hi there-

 Given the fact that the userid is in the lead position of the key in both
 approaches, I'm not sure that he'd have a region hotspotting problem
 because the userid should be able to offer some spread.




 On 10/10/12 12:55 PM, Jerry Lam chiling...@gmail.com wrote:

 Hi:
 
 So you are saying you have ~3TB of data stored per day?
 
 Using the second approach, all data for one day will go to only 1
 regionserver no matter what you do because HBase doesn't split a single
 row.
 
 Using the first approach, data will spread across regionservers but there
 will be hotspotted to each regionserver during write since this is a
 time-series problem.
 
 Best Regards,
 
 Jerry
 
 On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio yutoo.ya...@gmail.com
 wrote:
 
  hi
  i have a question about key  column design.
  in my application we have 3,000,000,000 record in every day
  each record contain : user-id, time stamp, content(max 1KB).
  we need to store records for one year, this means we will have about
  1,000,000,000,000 after 1 year.
  we just search a user-id over rang of time stamp
  table can design in two way
  1.key=userid-timestamp and column:=content
  2.key=userid-MMdd and column:HHmmss=content
 
 
  in first design we have tall-narrow table but we have very very
 records, in
  second design we have flat-wide table.
  which of them have better performance?
 
  thanks.
 





Re: HBase Key Design : Doubt

2012-10-10 Thread Jerry Lam
correct me if I'm wrong. The version applies to the individual cell (ie.
row key, column family and column qualifier) not (row key, column family).


On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K knarayana...@gmail.com wrote:

 Hi all,

 I have a usecase wherein I need to find the unique of some things in HBase
 across dates.

 Say, on 1st Oct, A-B-C-D appeared, hence I insert a row with rowkey :
 A-B-C-D.
 On 2nd Oct, I get the same value A-B-C-D and I don't want to redundantly
 store the row again with a new rowkey - A-B-C-D for 2nd Oct
 i.e I will not want to have 20121001-A-B-C-D and 20121002-A-B-C-D as 2
 rowkeys in the table.

 Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of
 versions are set to 1, only 1 row will be present in for both the dates
 having rowkey A-B-C-D.
 Hence if I need to find unique number of times A-B-C-D appeared during Oct
 1 and Oct 2, I just need to take rowcount of the row A-B-C-D by filtering
 over the 2 column families.
 Similarly, if we have 10  date column families, and I need to scan only for
 2 dates, then it scans only those store files having the specified column
 families. This will make scanning faster.

 But here the design problem is that I cant add more column families to the
 table each day.

 I would need to store data every day and I read that HBase doesnt work well
 with more than 3 column families.

 The other option is to have one single column family and store dates as
 qualifiers : date:d1, date:d2 But here if there are 30 date qualifiers
 under date column family, to scan a single date qualifier or may be range
 of 2-3 dates will have to scan through the entire data of all d1 to d30
 qualifiers in the date column family which would be slower compared to
 having separate column families for the each date..

 Please share your thoughts on this. Also any alternate design suggestions
 you might have.

 Regards,
 Narayanan



Re: How to specify empty value in HBase shell

2012-09-21 Thread Jerry Lam
Hi St.Ack:

I made some dirty changes to the script yesterday to work for me.
Basically, I changed the parse_column_name(column) function to:

def parse_column_name(column)
  split =
org.apache.hadoop.hbase.KeyValue.parseColumn(column.to_java_bytes)
  return split[0], (split.length  1) ? split[1] :
(org.apache.hadoop.hbase.KeyValue.getDelimiter(column.to_java_bytes, 0,
column.to_java_bytes.length(), 58)  0) ? ''.to_java_bytes : nil
end

Not sure if it makes sense as the general solution to the problem but at
least it seems to do the job.

The end result is that, if user specify COLUMNS without the delimiter, it
is treated as column family without qualifier. If there is delimiter but
the split has only 1 element, then the column qualifier is set to empty
value.

Best Regards,

Jerry


On Fri, Sep 21, 2012 at 12:42 AM, Stack st...@duboce.net wrote:

 On Thu, Sep 20, 2012 at 7:31 AM, Jerry Lam chiling...@gmail.com wrote:
  Hi HBase Community:
 
  I have been struggling to find a way to specify empty value/empty column
  qualifier in the hbase shell, but unsuccessful.
 
  I google it, nothing comes up. I don't know JRuby so that might be why.
 Do
  you know how?
 
  Example:
 
  scan 'Table',  {COLUMNS = 'cf:'} // note that the column family is cf
 and
  the column qualifier is empty (i.e. new byte[0])
 
  The above query will return all columns instead of the empty one.
 

 Sounds like no qualifier means all columns to shell.

 Do you have to use the 'empty qualifier'?  Thats a bit odd.  You
 really need it in your model?

 In the shell we are doing this:


 columns.each do |c|
   family, qualifier = parse_column_name(c.to_s)
   if qualifier
 scan.addColumn(family, qualifier)
   else
 scan.addFamily(family)
   end
 end


 If no qualifier, we think its a scan of the family.

 I don't really have a good answer for you.  In shell, what would you
 suggest we add so we do addColumn rather than addFamily if qualifier
 is empty?

 St.Ack



How to specify empty value in HBase shell

2012-09-20 Thread Jerry Lam
Hi HBase Community:

I have been struggling to find a way to specify empty value/empty column
qualifier in the hbase shell, but unsuccessful.

I google it, nothing comes up. I don't know JRuby so that might be why. Do
you know how?

Example:

scan 'Table',  {COLUMNS = 'cf:'} // note that the column family is cf and
the column qualifier is empty (i.e. new byte[0])

The above query will return all columns instead of the empty one.

I need only the values associated with the empty column qualifier. Please
help ~

Jerry


Re: Undelete Rows

2012-09-19 Thread Jerry Lam
Hi Alex:

we have a functionality which allows users to delete the data stored in
hbase but once in awhile, users can call us to undelete certain data that
have been deleted a hour/day ago. Since we run major compaction weekly, my
wishful thinking was that the data is still there and can be recovered if
we can just delete the delete marker. It seems it is not as easy as I
initially thought after reading the replies. Lars suggestion requires read
the deleted data and write it back which can be expensive.

Best Regards,

Jerry


On Wed, Sep 19, 2012 at 10:07 AM, Alex Baranau alex.barano...@gmail.comwrote:

 Hi Jerry,

 Just out of the curiosity: what is your use-case? Why do you want to do
 that? To gain extra protection from software error or smth else?

 Alex Baranau
 --
 Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

 On Tue, Sep 18, 2012 at 6:32 PM, lars hofhansl lhofha...@yahoo.com
 wrote:

  Can't do it (without low level massing HFiles, which you do not want to
  do).
  The best you can do (if you have HBASE-4536 and enabled
 KEEP_DELETED_CELLS
  for you column family) is to read the deleted rows back and write them
  again with a newer TS.
 
  -- Lars
 
 
 
  
   From: Jerry Lam chiling...@gmail.com
  To: user@hbase.apache.org
  Sent: Tuesday, September 18, 2012 3:06 PM
  Subject: Undelete Rows
 
  Hi HBase Community:
 
  I wonder if it is possible to undelete the rows that have been marked for
  deletion before the major compaction kicks in?
 
  I read about HBASE-4536 but I'm not sure if this can effectively remove
 the
  tombstone marker?
 
  Any input is appreciated.
 
  Best Regards,
 
  Jerry
 



Re: HBase aggregate query

2012-09-11 Thread Jerry Lam
Hi Prabhjot:

Can you implement this using a counter?
That is whenever you insert a row with the month(eventdate) and scene
combination, increment the associated counter by one. Note that if you have
a batch insert of N, you can increment the counter by N.

Then you can simply query the counter whenever you want the aggregated
result.

HTH,

Jerry

On Tue, Sep 11, 2012 at 1:59 PM, lars hofhansl lhofha...@yahoo.com wrote:

 That's when you aggregate along a sorted dimension (prefix of the key),
 though. Right?
 Not sure how smart Hive is here, but if it needs to sort the data it will
 probably be slower than SQL Server for such a small data set.



 - Original Message -
 From: James Taylor jtay...@salesforce.com
 To: user@hbase.apache.org
 Cc:
 Sent: Monday, September 10, 2012 5:49 PM
 Subject: Re: HBase aggregate query

 iwannaplay games funnlearnforkids@... writes:
 
  Hi ,
 
  I want to run query like
 
  select month(eventdate),scene,count(1),sum(timespent) from eventlog
  group by month(eventdate),scene
 
  in hbase.Through hive its taking a lot of time for 40 million
  records.Do we have any syntax in hbase to find its result?In sql
  server it takes around 9 minutes,How long it might take in hbase??
 
  Regards
  Prabhjot
 
 

 Hi,
 In our internal testing using server-side coprocessors for aggregation,
 we've
 found HBase can process these types of queries very quickly: ~10-12 seconds
 using a four node cluster. You need to chunk up and parallelize the work
 on the
 client side to get this kind of performance, though.
 Regards,

 James



Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Jerry Lam
Hi Loakim:

Sorry, your hypothesis doesn't make sense. I would suggest you to read the
Learning HBase Internals by Lars Hofhansl at
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
to
understand how HBase locking works.

Regarding to the issue you are facing, are you sure you configure the job
properly (i.e. requesting the jobtracker to have more than 1 mapper to
execute)? If you are testing on a single machine, you properly need to
configure the number of tasktracker per node as well to see more than 1
mapper to execute on a single machine.

my $0.02

Jerry

On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote:

 Hello,

 I would be grateful if someone could shed a light to the following:

 Each M/R map task is reading data from a separate region of a table.
 From the jobtracker 's GUI, at the map completion graph, I notice that
 although data read from mappers are different, they read data sequentially
 - like the table has a lock that permits only one mapper to read data from
 every region at a time.

 Does this lock hypothesis make sense? Is there any way I could avoid
 this useless delay?

 Thanks in advance and regards,
 Ioakim



Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Jerry Lam
Hi Loakim:

Here a list of links I would suggest you to read (I know it is a lot to
read):
HBase Related:
-
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html
-
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
- make sure to read the examples:
http://hbase.apache.org/book/mapreduce.example.html

Hadoop Related:
- http://wiki.apache.org/hadoop/JobTracker
- http://wiki.apache.org/hadoop/TaskTracker
- http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html
- Some Configurations:
http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html

HTH,

Jerry


On Tue, Sep 4, 2012 at 12:41 PM, Michael Segel michael_se...@hotmail.comwrote:

 I think the issue is that you are misinterpreting what you are seeing and
 what Doug was trying to tell you...

 The short simple answer is that you're getting one split per region. Each
 split is assigned to a specific mapper task and that task will sequentially
 walk through the table finding the rows that match your scan request.

 There is no lock or blocking.

 I think you really should actually read Lars George's book on HBase to get
 a better understanding.

 HTH

 -Mike

 On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote:

  Thank you very much for your response and for the excellent reference.
 
  The thing is that I am running jobs on a distributed environment and
 beyond the TableMapReduceUtil settings,
 
  I have just set the scan ' s caching to the number of rows I expect to
 retrieve at each map task, and the scan's caching blocks feature to false
 (just as it is indicated at MapReduce examples of HBase's homepage).
 
  I am not aware of such a job configuration (requesting jobtracker to
 execute more than 1 map tasks concurrently). Any other ideas?
 
  Thank you again and regards,
  ioakim
 
 
  On 09/04/2012 06:59 PM, Jerry Lam wrote:
  Hi Loakim:
 
  Sorry, your hypothesis doesn't make sense. I would suggest you to read
 the
  Learning HBase Internals by Lars Hofhansl at
 
 http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
  to
  understand how HBase locking works.
 
  Regarding to the issue you are facing, are you sure you configure the
 job
  properly (i.e. requesting the jobtracker to have more than 1 mapper to
  execute)? If you are testing on a single machine, you properly need to
  configure the number of tasktracker per node as well to see more than 1
  mapper to execute on a single machine.
 
  my $0.02
 
  Jerry
 
  On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com
 wrote:
 
  Hello,
 
  I would be grateful if someone could shed a light to the following:
 
  Each M/R map task is reading data from a separate region of a table.
  From the jobtracker 's GUI, at the map completion graph, I notice that
  although data read from mappers are different, they read data
 sequentially
  - like the table has a lock that permits only one mapper to read data
 from
  every region at a time.
 
  Does this lock hypothesis make sense? Is there any way I could avoid
  this useless delay?
 
  Thanks in advance and regards,
  Ioakim
 
 
 




Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-29 Thread Jerry Lam
Hi Lars:

Thanks for spending time discussing this with me. I appreciate it.

I tried to implement the setMaxVersions(1) inside the filter as follows:

@Override
public ReturnCode filterKeyValue(KeyValue kv) {

// check if the same qualifier as the one that has been included
previously. If yes, jump to next column
if (previousIncludedQualifier != null 
Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
previousIncludedQualifier = null;
return ReturnCode.NEXT_COL;
}
// another condition that makes the jump further using HINT
if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
LOG.info(Matched Found.);
return ReturnCode.SEEK_NEXT_USING_HINT;

}
// include this to the result and keep track of the included
qualifier so the next version of the same qualifier will be excluded
previousIncludedQualifier = kv.getQualifier();
return ReturnCode.INCLUDE;
}

Does this look reasonable or there is a better way to achieve this? It
would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case though.

Best Regards,

Jerry


On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl lhofha...@yahoo.com wrote:

 Hi Jerry,

 my answer will be the same again:
 Some folks will want the max versions set by the client to be before
 filters and some folks will want it to restrict the end result.
 It's not possible to have it both ways. Your filter needs to do the right
 thing.


 There's a lot of discussion around this in HBASE-5104.


 -- Lars



 
  From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
 Sent: Tuesday, August 28, 2012 1:52 PM
 Subject: Re: setTimeRange and setMaxVersions seem to be inefficient

 Hi Lars:

 I see. Please refer to the inline comment below.

 Best Regards,

 Jerry

 On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com
 wrote:

  What I was saying was: It depends. :)
 
  First off, how do you get to 1000 versions? In 0.94++ older version are
  pruned upon flush, so you need 333 flushes (assuming 3 versions on the
 CF)
  to get 1000 versions.
 

 I forgot that the default number of version to keep is 3. If this is what
 people use most of the time, yes you are right for this type of scenarios
 where the number of version per column to keep is small.

 By that time some compactions will have happened and you're back to close
  to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you
  have).
 
  Now, if you have that many version because because you set VERSIONS=1000
  in your CF... Then imagine you have 100 columns with 1000 versions each.
 

 Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the
 versioning myself)

 In your scenario below you'd do 10 comparisons if the filter would be
  evaluated after the version counting. But only 1100 with the current
 code.
  (or at least in that ball park)
 

 This is where I don't quite understand what you mean.

 if the framework counts the number of ReturnCode.INCLUDE and then stops
 feeding the KeyValue into the filterKeyValue method after it reaches the
 count specified in setMaxVersions (i.e. 1 for the case we discussed),
 should then be just 100 comparisons only (at most) instead of 1100
 comparisons? Maybe I don't understand how the current way is doing...



 
  The gist is: One can construct scenarios where one approach is better
 than
  the other. Only one order is possible.
  If you write a custom filter and you care about these things you should
  use the seek hints.
 
  -- Lars
 
 
  - Original Message -
  From: Jerry Lam chiling...@gmail.com
  To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
  Cc:
  Sent: Tuesday, August 28, 2012 7:17 AM
  Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
 
  Hi Lars:
 
  Thanks for the reply.
  I need to understand if I misunderstood the perceived inefficiency
 because
  it seems you don't think quite the same.
 
  Let say, as an example, we have 1 row with 2 columns (col-1 and col-2)
 in a
  table and each column has 1000 versions. Using the following code (the
 code
  might have errors and don't compile):
  /**
  * This is very simple use case of a ColumnPrefixFilter.
  * In fact all other filters that make use of filterKeyValue will see
  similar
  * performance problems that I have concerned with when the number of
  * versions per column could be huge.
 
  Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2));
  Scan scan = new Scan();
  scan.setFilter(filter);
  ResultScanner scanner = table.getScanner(scan);
  for (Result result : scanner) {
  for (KeyValue kv : result.raw()) {
  System.out.println(KV:  + kv + , Value:  +
  Bytes.toString(kv.getValue()));
  }
  }
  scanner.close();
  */
 
  Implicitly, the number of version per column that is going to return is 1
  (the latest version). User might expect that only 2 comparisons for
 column
  prefix are needed (1 for col-1

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-29 Thread Jerry Lam
Hi Ted:

Sure, will do.
I also implement the reset method to set previousIncludedQualifier to null
for the next row to come.

Best Regards,

Jerry

On Wed, Aug 29, 2012 at 1:47 PM, Ted Yu yuzhih...@gmail.com wrote:

 Jerry:
 Remember to also implement:

 +  @Override
 +  public KeyValue getNextKeyHint(KeyValue currentKV) {

 You can log a JIRA for supporting ReturnCode.INCLUDE_AND_NEXT_COL.

 Cheers

 On Wed, Aug 29, 2012 at 6:59 AM, Jerry Lam chiling...@gmail.com wrote:

  Hi Lars:
 
  Thanks for spending time discussing this with me. I appreciate it.
 
  I tried to implement the setMaxVersions(1) inside the filter as follows:
 
  @Override
  public ReturnCode filterKeyValue(KeyValue kv) {
 
  // check if the same qualifier as the one that has been included
  previously. If yes, jump to next column
  if (previousIncludedQualifier != null 
  Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) {
  previousIncludedQualifier = null;
  return ReturnCode.NEXT_COL;
  }
  // another condition that makes the jump further using HINT
  if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) {
  LOG.info(Matched Found.);
  return ReturnCode.SEEK_NEXT_USING_HINT;
 
  }
  // include this to the result and keep track of the included
  qualifier so the next version of the same qualifier will be excluded
  previousIncludedQualifier = kv.getQualifier();
  return ReturnCode.INCLUDE;
  }
 
  Does this look reasonable or there is a better way to achieve this? It
  would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case
 though.
 
  Best Regards,
 
  Jerry
 
 
  On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl lhofha...@yahoo.com
  wrote:
 
   Hi Jerry,
  
   my answer will be the same again:
   Some folks will want the max versions set by the client to be before
   filters and some folks will want it to restrict the end result.
   It's not possible to have it both ways. Your filter needs to do the
 right
   thing.
  
  
   There's a lot of discussion around this in HBASE-5104.
  
  
   -- Lars
  
  
  
   
From: Jerry Lam chiling...@gmail.com
   To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
   Sent: Tuesday, August 28, 2012 1:52 PM
   Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
  
   Hi Lars:
  
   I see. Please refer to the inline comment below.
  
   Best Regards,
  
   Jerry
  
   On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com
   wrote:
  
What I was saying was: It depends. :)
   
First off, how do you get to 1000 versions? In 0.94++ older version
 are
pruned upon flush, so you need 333 flushes (assuming 3 versions on
 the
   CF)
to get 1000 versions.
   
  
   I forgot that the default number of version to keep is 3. If this is
 what
   people use most of the time, yes you are right for this type of
 scenarios
   where the number of version per column to keep is small.
  
   By that time some compactions will have happened and you're back to
 close
to 3 versions (maybe 9, 12, or 15 or so, depending on how store files
  you
have).
   
Now, if you have that many version because because you set
  VERSIONS=1000
in your CF... Then imagine you have 100 columns with 1000 versions
  each.
   
  
   Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the
   versioning myself)
  
   In your scenario below you'd do 10 comparisons if the filter would
 be
evaluated after the version counting. But only 1100 with the current
   code.
(or at least in that ball park)
   
  
   This is where I don't quite understand what you mean.
  
   if the framework counts the number of ReturnCode.INCLUDE and then stops
   feeding the KeyValue into the filterKeyValue method after it reaches
 the
   count specified in setMaxVersions (i.e. 1 for the case we discussed),
   should then be just 100 comparisons only (at most) instead of 1100
   comparisons? Maybe I don't understand how the current way is doing...
  
  
  
   
The gist is: One can construct scenarios where one approach is better
   than
the other. Only one order is possible.
If you write a custom filter and you care about these things you
 should
use the seek hints.
   
-- Lars
   
   
- Original Message -
From: Jerry Lam chiling...@gmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Cc:
Sent: Tuesday, August 28, 2012 7:17 AM
Subject: Re: setTimeRange and setMaxVersions seem to be inefficient
   
Hi Lars:
   
Thanks for the reply.
I need to understand if I misunderstood the perceived inefficiency
   because
it seems you don't think quite the same.
   
Let say, as an example, we have 1 row with 2 columns (col-1 and
 col-2)
   in a
table and each column has 1000 versions. Using the following code
 (the
   code
might have errors and don't compile):
/**
* This is very simple use case

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread Jerry Lam
Hi Lars:

Thanks for the reply.
I need to understand if I misunderstood the perceived inefficiency because
it seems you don't think quite the same.

Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a
table and each column has 1000 versions. Using the following code (the code
might have errors and don't compile):
/**
 * This is very simple use case of a ColumnPrefixFilter.
 * In fact all other filters that make use of filterKeyValue will see
similar
 * performance problems that I have concerned with when the number of
 * versions per column could be huge.

Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (KeyValue kv : result.raw()) {
System.out.println(KV:  + kv + , Value:  +
Bytes.toString(kv.getValue()));
}
}
scanner.close();
*/

Implicitly, the number of version per column that is going to return is 1
(the latest version). User might expect that only 2 comparisons for column
prefix are needed (1 for col-1 and 1 for col-2) but in fact, it processes
the filterKeyValue method in ColumnPrefixFilter 1000 times (1 for col-1 and
1000 for col-2) for col-2 (1 per version) because all versions of the
column have the same prefix for obvious reason. For col-1, it will skip
using SEEK_NEXT_USING_HINT which should skip the 99 versions of col-1.

In summary, the 1000 comparisons (5000 byte comparisons) for the column
prefix col-2 is wasted because only 1 version is returned to user. Also,
I believe this inefficiency is hidden from the user code but it affects all
filters that use filterKeyValue as the main execution for filtering KVs. Do
we have a case to improve HBase to handle this inefficiency? :) It seems
valid unless you prove otherwise.

Best Regards,

Jerry



On Tue, Aug 28, 2012 at 12:54 AM, lars hofhansl lhofha...@yahoo.com wrote:

 First off regarding inefficiency... If version counting would happen
 first and then filter were executed we'd have folks complaining about
 inefficiencies as well:
 (Why does the code have to go through the versioning stuff when my filter
 filters the row/column/version anyway?)  ;-)


 For your problem, you want to make use of seek hints...

 In addition to INCLUDE you can return NEXT_COL, NEXT_ROW, or even
 SEEK_NEXT_USING_HINT from Filter.filterKeyValue(...).

 That way the scanning framework will know to skip ahead to the next
 column, row, or a KV of your choosing. (see Filter.filterKeyValue and
 Filter.getNextKeyHint).

 (as an aside, it would probably be nice if Filters also had
 INCLUDE_AND_NEXT_COL, INCLUDE_AND_NEXT_ROW, internally used by StoreScanner)

 Have a look at ColumnPrefixFilter as an example.
 I also wrote a short post here:
 http://hadoop-hbase.blogspot.com/2012/01/filters-in-hbase-or-intra-row-scanning.html

 Does that help?

 -- Lars


 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Monday, August 27, 2012 5:59 PM
 Subject: Re: setTimeRange and setMaxVersions seem to be inefficient

 Hi Lars:

 Thanks for confirming the inefficiency of the implementation for this
 case. For my case, a column can have more than 10K versions, I need a quick
 way to stop the scan from digging the column once there is a match
 (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can notify
 the framework to stop and go to next column once the number of versions
 specify in setMaxVersions is met.

 For now, I guess I have to hack it in the custom filter (I.e. I keep the
 count myself)? If you have a better way to achieve this, please share :)

 Best Regards,

 Jerry

 Sent from my iPad (sorry for spelling mistakes)

 On 2012-08-27, at 20:11, lars hofhansl lhofha...@yahoo.com wrote:

  Currently filters are evaluated before we do version counting.
 
  Here's a comment from ScanQueryMatcher.java:
  /**
   * Filters should be checked before checking column trackers. If we
 do
   * otherwise, as was previously being done, ColumnTracker may
 increment its
   * counter for even that KV which may be discarded later on by
 Filter. This
   * would lead to incorrect results in certain cases.
   */
 
 
  So this is by design. (Doesn't mean it's correct or desirable, though.)
 
  -- Lars
 
 
  - Original Message -
  From: Jerry Lam chiling...@gmail.com
  To: user user@hbase.apache.org
  Cc:
  Sent: Monday, August 27, 2012 2:40 PM
  Subject: setTimeRange and setMaxVersions seem to be inefficient
 
  Hi HBase community:
 
  I tried to use setTimeRange and setMaxVersions to limit the number of KVs
  return per column. The behaviour is as I would expect that is
  setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of
 KV
  with timestamp that is less than or equal to T.
  However, I noticed that all versions

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread Jerry Lam
Hi Lars:

I see. Please refer to the inline comment below.

Best Regards,

Jerry

On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com wrote:

 What I was saying was: It depends. :)

 First off, how do you get to 1000 versions? In 0.94++ older version are
 pruned upon flush, so you need 333 flushes (assuming 3 versions on the CF)
 to get 1000 versions.


I forgot that the default number of version to keep is 3. If this is what
people use most of the time, yes you are right for this type of scenarios
where the number of version per column to keep is small.

By that time some compactions will have happened and you're back to close
 to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you
 have).

 Now, if you have that many version because because you set VERSIONS=1000
 in your CF... Then imagine you have 100 columns with 1000 versions each.


Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the
versioning myself)

In your scenario below you'd do 10 comparisons if the filter would be
 evaluated after the version counting. But only 1100 with the current code.
 (or at least in that ball park)


This is where I don't quite understand what you mean.

if the framework counts the number of ReturnCode.INCLUDE and then stops
feeding the KeyValue into the filterKeyValue method after it reaches the
count specified in setMaxVersions (i.e. 1 for the case we discussed),
should then be just 100 comparisons only (at most) instead of 1100
comparisons? Maybe I don't understand how the current way is doing...




 The gist is: One can construct scenarios where one approach is better than
 the other. Only one order is possible.
 If you write a custom filter and you care about these things you should
 use the seek hints.

 -- Lars


 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
 Cc:
 Sent: Tuesday, August 28, 2012 7:17 AM
 Subject: Re: setTimeRange and setMaxVersions seem to be inefficient

 Hi Lars:

 Thanks for the reply.
 I need to understand if I misunderstood the perceived inefficiency because
 it seems you don't think quite the same.

 Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a
 table and each column has 1000 versions. Using the following code (the code
 might have errors and don't compile):
 /**
 * This is very simple use case of a ColumnPrefixFilter.
 * In fact all other filters that make use of filterKeyValue will see
 similar
 * performance problems that I have concerned with when the number of
 * versions per column could be huge.

 Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2));
 Scan scan = new Scan();
 scan.setFilter(filter);
 ResultScanner scanner = table.getScanner(scan);
 for (Result result : scanner) {
 for (KeyValue kv : result.raw()) {
 System.out.println(KV:  + kv + , Value:  +
 Bytes.toString(kv.getValue()));
 }
 }
 scanner.close();
 */

 Implicitly, the number of version per column that is going to return is 1
 (the latest version). User might expect that only 2 comparisons for column
 prefix are needed (1 for col-1 and 1 for col-2) but in fact, it processes
 the filterKeyValue method in ColumnPrefixFilter 1000 times (1 for col-1 and
 1000 for col-2) for col-2 (1 per version) because all versions of the
 column have the same prefix for obvious reason. For col-1, it will skip
 using SEEK_NEXT_USING_HINT which should skip the 99 versions of col-1.

 In summary, the 1000 comparisons (5000 byte comparisons) for the column
 prefix col-2 is wasted because only 1 version is returned to user. Also,
 I believe this inefficiency is hidden from the user code but it affects all
 filters that use filterKeyValue as the main execution for filtering KVs. Do
 we have a case to improve HBase to handle this inefficiency? :) It seems
 valid unless you prove otherwise.

 Best Regards,

 Jerry



 On Tue, Aug 28, 2012 at 12:54 AM, lars hofhansl lhofha...@yahoo.com
 wrote:

  First off regarding inefficiency... If version counting would happen
  first and then filter were executed we'd have folks complaining about
  inefficiencies as well:
  (Why does the code have to go through the versioning stuff when my
 filter
  filters the row/column/version anyway?)  ;-)
 
 
  For your problem, you want to make use of seek hints...
 
  In addition to INCLUDE you can return NEXT_COL, NEXT_ROW, or even
  SEEK_NEXT_USING_HINT from Filter.filterKeyValue(...).
 
  That way the scanning framework will know to skip ahead to the next
  column, row, or a KV of your choosing. (see Filter.filterKeyValue and
  Filter.getNextKeyHint).
 
  (as an aside, it would probably be nice if Filters also had
  INCLUDE_AND_NEXT_COL, INCLUDE_AND_NEXT_ROW, internally used by
 StoreScanner)
 
  Have a look at ColumnPrefixFilter as an example.
  I also wrote a short post here:
 
 http://hadoop-hbase.blogspot.com/2012/01/filters-in-hbase-or-intra-row-scanning.html

Re: Column Value Reference Timestamp Filter

2012-08-27 Thread Jerry Lam
Hi Alex:

We decided to use setTimeRange and setMaxVersions, and remove the column
with a reference timestamp (i.e. we don't put this column into hbase
anymore). This behavior is what we would like but it seems very inefficient
because all versions are processed before the setMaxVersions takes effect
(I just posted some new findings in another post).

Best Regards,

Jerry

On Mon, Aug 20, 2012 at 4:47 PM, Alex Baranau alex.barano...@gmail.comwrote:

 Hi,

 So, you have row with key rowKeyA and column col1. And it contains two
 values value1 and value2 at timestamp1 and timestamp2 respectively, where
 timestamp1 is most recent. And you want to fetch most recent but one
 values in all columns when doing the scan. I.e. you don't know the
 timestamp1 or timestamp2 exactly you just need to fetch the value which was
 placed before the most recent one. Is that correct?

 Don't think there's some filter that would allow you to do so
 out-of-the-box. You should probably be able to write such filter and use
 scan.setMaxVersions(2). Not sure if keyvalues are fed into filter ordered
 by their timestamp..

 How about returning 2 most recent values to the client and filtering on the
 client-side? Why this doesn't work in your case? (large values in columns
 in size or?).

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 On Mon, Aug 20, 2012 at 2:57 PM, Jerry Lam chiling...@gmail.com wrote:

  Hi HBase community:
 
  I have a requirement in which I need to query a row based on the
 timestamp
  stored in the value of a column of a row. For example.
 
  (rowkeyA of col1) - (value) at timestamp = t1, (value) stores t2. Result
  should return all columns of rowkeyA at timestamp = t2.
 
  Note that t1  t2 ALWAYS.
 
  Can this sound like something that can be done using Filter? If yes, can
 it
  be done using the existing filters in HBase without customization?
 
  Best Regards,
 
  Jerry
 



Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-27 Thread Jerry Lam
Hi Lars:

Thanks for confirming the inefficiency of the implementation for this case. For 
my case, a column can have more than 10K versions, I need a quick way to stop 
the scan from digging the column once there is a match (ReturnCode.INCLUDE). It 
would be nice to have a ReturnCode that can notify the framework to stop and go 
to next column once the number of versions specify in setMaxVersions is met. 

For now, I guess I have to hack it in the custom filter (I.e. I keep the count 
myself)? If you have a better way to achieve this, please share :)

Best Regards,

Jerry

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-27, at 20:11, lars hofhansl lhofha...@yahoo.com wrote:

 Currently filters are evaluated before we do version counting.
 
 Here's a comment from ScanQueryMatcher.java:
 /**
  * Filters should be checked before checking column trackers. If we do
  * otherwise, as was previously being done, ColumnTracker may increment 
 its
  * counter for even that KV which may be discarded later on by Filter. 
 This
  * would lead to incorrect results in certain cases.
  */
 
 
 So this is by design. (Doesn't mean it's correct or desirable, though.)
 
 -- Lars
 
 
 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user user@hbase.apache.org
 Cc: 
 Sent: Monday, August 27, 2012 2:40 PM
 Subject: setTimeRange and setMaxVersions seem to be inefficient
 
 Hi HBase community:
 
 I tried to use setTimeRange and setMaxVersions to limit the number of KVs
 return per column. The behaviour is as I would expect that is
 setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV
 with timestamp that is less than or equal to T.
 However, I noticed that all versions of the KeyValue for a particular
 column are processed through a custom filter I implemented even though I
 specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE
 KV of a particular column has ReturnCode.INCLUDE, the framework will jump
 to the next COL instead of iterating through all versions of the column.
 
 Can someone confirm me if this is the expected behaviour (iterating through
 all versions of a column before setMaxVersions take effect)? If this is an
 expected behaviour, what is your recommendation to speed this up?
 
 Best Regards,
 
 Jerry
 


Column Value Reference Timestamp Filter

2012-08-20 Thread Jerry Lam
Hi HBase community:

I have a requirement in which I need to query a row based on the timestamp
stored in the value of a column of a row. For example.

(rowkeyA of col1) - (value) at timestamp = t1, (value) stores t2. Result
should return all columns of rowkeyA at timestamp = t2.

Note that t1  t2 ALWAYS.

Can this sound like something that can be done using Filter? If yes, can it
be done using the existing filters in HBase without customization?

Best Regards,

Jerry


Re: Disk space usage of HFilev1 vs HFilev2

2012-08-14 Thread Jerry Lam
Hi Anil:

Maybe you can try to compare the two HFile implementation directly? Let say
write 1000 rows into HFile v1 format and then into HFile v2 format. You can
then compare the size of the two directly?

HTH,

Jerry

On Tue, Aug 14, 2012 at 3:36 PM, anil gupta anilgupt...@gmail.com wrote:

 Hi Zahoor,

 Then it seems like i might have missed something when doing hdfs usage
 estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for
 getting the hdfs usage of a table. Is this the right way? Since i wiped of
 the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it
 possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
 In this way i can do a fair comparison.

 Thanks,
 Anil Gupta

 On Tue, Aug 14, 2012 at 12:13 PM, jmozah jmo...@gmail.com wrote:

  Hi Anil,
 
  I really doubt that there is 50% drop in file sizes... As far as i know..
  there is no drastic space conserving feature in V2. Just as  an after
  thought.. do a major compact and check the sizes.
 
  ./Zahoor
  http://blog.zahoor.in
 
 
  On 15-Aug-2012, at 12:31 AM, anil gupta anilgupt...@gmail.com wrote:
 
   l
 
 


 --
 Thanks  Regards,
 Anil Gupta



Re: multitable query

2012-08-10 Thread Jerry Lam
Hi Wei:

There is a jira Hbase-3996, does this sound something you are looking for?

Regards,

Jerry

On Friday, August 10, 2012, Bryan Beaudreault wrote:

 Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make
 sure to use the same sort and partitions on the first two.

 Sent from iPhone.

 On Aug 10, 2012, at 9:41 AM, Weishung Chung weish...@gmail.comjavascript:;
 wrote:

  but they are in production now
 
  On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung 
  weish...@gmail.comjavascript:;
 wrote:
 
  Thank you, I am trying to avoid to fetch by gets and would like to do
  something like hadoop MultipleInputs.
  Yes, it would be nice if i could denormalize and remodel the schema.
 
 
  On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana 
  ama...@gmail.comjavascript:;
 wrote:
 
  You can scan over one of the tables (using TableInputFormat) and do
 simple
  gets on the other table for every row that you want to join.
 
  An interesting question to address here would be - why even need a
 join.
  Can you talk more about the data and what you are trying to do? In
 general
  you really want to denormalize and not need joins when working with
 HBase
  (or for that matter most NoSQL stores).
 
  On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung 
  weish...@gmail.comjavascript:;
 
  wrote:
 
  Basically a join of two data sets on the same row key.
 
  On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana 
  ama...@gmail.comjavascript:;
 
  wrote:
 
  How do you want to use two tables? Can you explain your algo a bit?
 
  On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung 
  weish...@gmail.comjavascript:;
 
  wrote:
 
  Hi HBase users,
 
  I need to pull data from 2 HBase tables in a mapreduce job. For 1
  table
  input, I use TableMapReduceUtil.initTableMapperJob. Is there another
  method
  for multitable inputs ?
 
  Thank you,
  Wei Shung
 
 
 
 
 
 



Re: CheckAndAppend Feature

2012-08-07 Thread Jerry Lam
Hi Lars:

This helps a lot! Thanks!

Best Regards,

Jerry

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-07, at 20:30, lars hofhansl lhofha...@yahoo.com wrote:

 I filed HBASE-6522. It is a trivial change to make locks and leases available 
 to coprocessors.
 So checkAndSet type operations can then be implemented via coprocessor 
 endpoints: lock row, check, fail or update, unlock row.
 
 Since the patch is so simple I'll commit that soon (to 0.94.2 and 0.96)
 
 
 -- Lars
 
 
 From: lars hofhansl lhofha...@yahoo.com
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Tuesday, August 7, 2012 8:55 AM
 Subject: Re: CheckAndAppend Feature
 
 There is no such functionality currently, and there is no good way to 
 simulate that.
 
 Currently that cannot even be done with a coprocessor endpoint, because 
 region coprocessors have no way to create a region lock (just checked the 
 code).
 (That is something we have to change I think - I will create an issue once 
 the Jira system is back from the walk in the park).
 
 -- Lars
 
 
 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user user@hbase.apache.org
 Cc: 
 Sent: Tuesday, August 7, 2012 8:22 AM
 Subject: CheckAndAppend Feature
 
 Hi HBase community:
 
 I checked the HTable API, it has checkAndPut and checkAndDelete but I'm
 looking for checkAndAppend. Is there a way to simulate similarly?
 For instance, I want to check the last 32 bytes of a value (let assume that
 it has 128 bytes in total) in a column before appending atomically some
 values into it.
 
 Thanks!
 
 Jerry


Re: HBaseTestingUtility on windows

2012-08-03 Thread Jerry Lam
Hi Mohit:

You might need to install Cygwin if the tool has dependency on Linux
command like bash.

Best Regards,

Jerry

On Friday, August 3, 2012, N Keywal wrote:

 Hi Mohit,

 For simple cases, it works for me for hbase 0.94 at least. But I'm not
 sure it works for all features. I've never tried to run hbase unit
 tests on windows for example.

 N.

 On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia 
 mohitanch...@gmail.comjavascript:;
 wrote:
  I am trying to run mini cluster using HBaseTestingUtility Class from
 hbase
  tests on windows, but I get bash command error. Is it not possible to
 run
  this utility class on windows?
 
  I followed this example:
 
 
 http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/



Re: Filter with State

2012-08-02 Thread Jerry Lam
Hi Lars:

That is useful. I appreciate it. The idea about cross row transaction is an
interesting one.

Can I have an iterator on the client side that get rows from a coprocessor?
(i.e. Filtered rows are streamed into the client application and client can
access them via iterator)

Best Regards,

Jerry


On Thu, Aug 2, 2012 at 12:13 AM, lars hofhansl lhofha...@yahoo.com wrote:

 The Filter is initialized per Region as part of a RegionScannerImpl.

 So as long as all the rows you are interested are co-located in the same
 region you can keep that state in the Filter instance.

 You can use a custom RegionSplitPolicy to control (to some extend at
 least) how the rows are colocated (KeyPrefixRegionSplitPolicy is an
 example).

 I also blogged about this here (in the context of cross row transactions):
 http://hadoop-hbase.blogspot.com/2012/02/limited-cross-row-transactions-in-hbase.html


 Maybe what you really are looking for are coprocessors?


 -- Lars



 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc:
 Sent: Wednesday, August 1, 2012 7:06 PM
 Subject: Re: Filter with State

 Hi Lars,

 I understand that it is more difficult to carry states across
 regions/servers, how about in a single region? Knowing that the rows in a
 single region have dependencies, can we have filter with state? If filter
 doesn't provide this ability, is there other mechanism in hbase to offer
 this kind of functionalities?

 I think this is a good feature because it allows efficient scanning on
 dependent rows. Instead of fetching each row to the client side and check
 if we should fetch the next row, the filter on the server side handles this
 logic.

 Best Regards,

 Jerry

 Sent from my iPad (sorry for spelling mistakes)

 On 2012-08-01, at 21:52, lars hofhansl lhofha...@yahoo.com wrote:

  The issue here is that different rows can be located in different
 regions or even different region servers, so no local state will carry over
 all rows.
 
 
 
  - Original Message -
  From: Jerry Lam chiling...@gmail.com
  To: user@hbase.apache.org user@hbase.apache.org
  Cc: user@hbase.apache.org user@hbase.apache.org
  Sent: Wednesday, August 1, 2012 5:48 PM
  Subject: Re: Filter with State
 
  Hi St.Ack:
 
  Schema cannot be changed to a single row.
  The API describes Do not rely on filters carrying state across rows;
 its not reliable in current hbase as we have no handlers in place for when
 regions split, close or server crashes. If we manage region splitting
 ourselves, so the split issue doesn't apply. Other failures can be handled
 on the application level. Does each invocation of scanner.next instantiate
 a new filter at the server side even on the same region (I.e. Does scanning
 on the same region use the same filter or different filter depending on the
 scanner.next calls??)
 
  Best Regards,
 
  Jerry
 
  Sent from my iPad (sorry for spelling mistakes)
 
  On 2012-08-01, at 18:44, Stack st...@duboce.net wrote:
 
  On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com
 wrote:
  Hi HBase guru:
 
  From Lars George talk, he mentions that filter has no state. What if I
 need
  to scan rows in which the decision to filter one row or not is based
 on the
  previous row's column values? Any idea how one can implement this type
 of
  logic?
 
  You could try carrying state in the client (but if client dies state
 dies).
 
  You can't have scanners carry state across rows.  It says so in API
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description
  (Whatever about the API, if LarsG says it, it must be so!).
 
  Here is the issue: If row X is in region A on server 1 there is
  nothing to prevent row X+1 from being on region B on server 2.  How do
  you carry the state between such rows reliably?
 
  Can you redo your schema such that the state you need to carry remains
  within a row?
  St.Ack
 




Re: sync on writes

2012-08-01 Thread Jerry Lam
I believe you are talking about enabling dfs.support.append feature? I
benchmarked the difference (disable/enable) previously and I don't find
much differences. It would be great if someone else can confirm on this.

Best Regards,

Jerry

On Wednesday, August 1, 2012, Alex Baranau wrote:

 I believe that this is *not default*, but *current* implementation of
 sync(). I.e. (please correct me if I'm wrong) n-way write approach is not
 available yet.
 You might confuse it with the fact that by default, sync() is called on
 every edit. And you can change it by using deferred log flushing. Either
 way, sync() is going to be a pipelined write.

 There's an explanation of benefits of pipelined and n-way writes there in
 the book (p337), it's not just about which approach provides better
  durability of saved edits. Both of them do. But both can take different
 time to execute and utilize network differently: pipelined *may* be slower
 but can saturate network bandwidth better.

 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr

 On Tue, Jul 31, 2012 at 9:09 PM, Mohit Anchlia 
 mohitanch...@gmail.comjavascript:;
 wrote:

  In the HBase book it mentioned that the default behaviour of write is to
  call sync on each node before sending replica copies to the nodes in the
  pipeline. Is there a reason this was kept default because if data is
  getting written on multiple nodes then likelyhood of losing data is
 really
  low since another copy is always there on the replica nodes. Is it ok to
  make this sync async and is it advisable?
 



 --
 Alex Baranau
 --
 Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
 Solr



Re: Query a version of a column efficiently

2012-08-01 Thread Jerry Lam
Thanks Suraj. I looked at the code but it looks like the logic is not
self-contained, particularly for the way hbase works with search for a
specific version using TimeRange.

Best Regards,

Jerry

On Mon, Jul 30, 2012 at 12:53 PM, Suraj Varma svarma...@gmail.com wrote:

 You may need to setup your Eclipse workspace and search using
 references etc.To get started, this is one class that uses TimeRange
 based matching ...
 org.apache.hadoop.hbase.regionserver.ScanQueryMatcher
 Also - Get is internally implemented as a Scan over a single row.

 Hope this gets you started.
 --Suraj

 On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam chiling...@gmail.com wrote:
  Hi St.Ack:
 
  Can you tell me which source code is responsible for the logic. The
 source code in the get and scan doesnt provide an indication of how the
 setTimeRange works.
 
  Best Regards,
 
  Jerry
 
  Sent from my iPad (sorry for spelling mistakes)
 
  On 2012-07-26, at 18:30, Stack st...@duboce.net wrote:
 
  On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam chiling...@gmail.com
 wrote:
  Hi St.Ack:
 
  Let say there are 5 versions for a column A with timestamp = [0, 1, 3,
 6,
  10].
  I want to execute an efficient query that returns one version of the
 column
  that has a timestamp that is equal to 5 or less. So in this case, it
 should
  return the value of the column A with timestamp = 3.
 
  Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my
 guess
  is that it will return the version 6 not version 3. Correct me if I'm
  wrong.
 
 
  What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
  you 6 since that is outside of the timerange (try 0 instead of
  MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
  have to check code).
 
  St.Ack



Re: Filter with State

2012-08-01 Thread Jerry Lam
Hi St.Ack:

Schema cannot be changed to a single row.
The API describes Do not rely on filters carrying state across rows; its not 
reliable in current hbase as we have no handlers in place for when regions 
split, close or server crashes. If we manage region splitting ourselves, so 
the split issue doesn't apply. Other failures can be handled on the application 
level. Does each invocation of scanner.next instantiate a new filter at the 
server side even on the same region (I.e. Does scanning on the same region use 
the same filter or different filter depending on the scanner.next calls??)

Best Regards,

Jerry 

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-01, at 18:44, Stack st...@duboce.net wrote:

 On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com wrote:
 Hi HBase guru:
 
 From Lars George talk, he mentions that filter has no state. What if I need
 to scan rows in which the decision to filter one row or not is based on the
 previous row's column values? Any idea how one can implement this type of
 logic?
 
 You could try carrying state in the client (but if client dies state dies).
 
 You can't have scanners carry state across rows.  It says so in API
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description
 (Whatever about the API, if LarsG says it, it must be so!).
 
 Here is the issue: If row X is in region A on server 1 there is
 nothing to prevent row X+1 from being on region B on server 2.  How do
 you carry the state between such rows reliably?
 
 Can you redo your schema such that the state you need to carry remains
 within a row?
 St.Ack


Re: Filter with State

2012-08-01 Thread Jerry Lam
Hi Lars,

I understand that it is more difficult to carry states across regions/servers, 
how about in a single region? Knowing that the rows in a single region have 
dependencies, can we have filter with state? If filter doesn't provide this 
ability, is there other mechanism in hbase to offer this kind of 
functionalities?

I think this is a good feature because it allows efficient scanning on 
dependent rows. Instead of fetching each row to the client side and check if we 
should fetch the next row, the filter on the server side handles this logic. 

Best Regards,

Jerry 

Sent from my iPad (sorry for spelling mistakes)

On 2012-08-01, at 21:52, lars hofhansl lhofha...@yahoo.com wrote:

 The issue here is that different rows can be located in different regions or 
 even different region servers, so no local state will carry over all rows.
 
 
 
 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, August 1, 2012 5:48 PM
 Subject: Re: Filter with State
 
 Hi St.Ack:
 
 Schema cannot be changed to a single row.
 The API describes Do not rely on filters carrying state across rows; its not 
 reliable in current hbase as we have no handlers in place for when regions 
 split, close or server crashes. If we manage region splitting ourselves, so 
 the split issue doesn't apply. Other failures can be handled on the 
 application level. Does each invocation of scanner.next instantiate a new 
 filter at the server side even on the same region (I.e. Does scanning on the 
 same region use the same filter or different filter depending on the 
 scanner.next calls??)
 
 Best Regards,
 
 Jerry 
 
 Sent from my iPad (sorry for spelling mistakes)
 
 On 2012-08-01, at 18:44, Stack st...@duboce.net wrote:
 
 On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com wrote:
 Hi HBase guru:
 
 From Lars George talk, he mentions that filter has no state. What if I need
 to scan rows in which the decision to filter one row or not is based on the
 previous row's column values? Any idea how one can implement this type of
 logic?
 
 You could try carrying state in the client (but if client dies state dies).
 
 You can't have scanners carry state across rows.  It says so in API
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description
 (Whatever about the API, if LarsG says it, it must be so!).
 
 Here is the issue: If row X is in region A on server 1 there is
 nothing to prevent row X+1 from being on region B on server 2.  How do
 you carry the state between such rows reliably?
 
 Can you redo your schema such that the state you need to carry remains
 within a row?
 St.Ack
 


Re: How to query by rowKey-infix

2012-07-31 Thread Jerry Lam
Hi Chris:

I'm thinking about building a secondary index for primary key lookup, then
query using the primary keys in parallel.

I'm interested to see if there is other option too.

Best Regards,

Jerry

On Tue, Jul 31, 2012 at 11:27 AM, Christian Schäfer syrious3...@yahoo.dewrote:

 Hello there,

 I designed a row key for queries that need best performance (~100 ms)
 which looks like this:

 userId-date-sessionId

 These queries(scans) are always based on a userId and sometimes
 additionally on a date, too.
 That's no problem with the key above.

 However, another kind of queries shall be based on a given time range
 whereas the outermost left userId is not given or known.
 In this case I need to get all rows covering the given time range with
 their date to create a daily reporting.

 As I can't set wildcards at the beginning of a left-based index for the
 scan,
 I only see the possibility to scan the index of the whole table to collect
 the
 rowKeys that are inside the timerange I'm interested in.

 Is there a more elegant way to collect rows within time range X?
 (Unfortunately, the date attribute is not equal to the timestamp that is
 stored by hbase automatically.)

 Could/should one maybe leverage some kind of row key caching to accelerate
 the collection process?
 Is that covered by the block cache?

 Thanks in advance for any advice.

 regards
 Chris



Query a version of a column efficiently

2012-07-26 Thread Jerry Lam
Hi HBase guru:

I need some advises on a problem that I'm facing using HBase. How can I
efficiently query a version of a column when I don't know exactly the
version I'm looking for?
For instance, I want to query a column with timestamp that is less or equal
to N, if version = N is available, return it to me. Otherwise, I want the
version that is closest to the version N (order by descending of
timestamp). That is if version = N - 1 exists, I want it to be returned.

I looked into the TimeRange query, it doesn't seem to provide this semantic
naturally. Note that I don't know which version is closest to N so the
setTimeRange(0,N+1). Do I need to implement a filter to do that or is it
already available?

Any help will be appreciated.

Best Regards,

Jerry


Re: Query a version of a column efficiently

2012-07-26 Thread Jerry Lam
Hi St.Ack:

Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
10].
I want to execute an efficient query that returns one version of the column
that has a timestamp that is equal to 5 or less. So in this case, it should
return the value of the column A with timestamp = 3.

Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
is that it will return the version 6 not version 3. Correct me if I'm
wrong.

Best Regards,

Jerry

On Thu, Jul 26, 2012 at 5:13 PM, Stack st...@duboce.net wrote:

 On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam chiling...@gmail.com wrote:
  I need some advises on a problem that I'm facing using HBase. How can I
  efficiently query a version of a column when I don't know exactly the
  version I'm looking for?
  For instance, I want to query a column with timestamp that is less or
 equal
  to N, if version = N is available, return it to me. Otherwise, I want the
  version that is closest to the version N (order by descending of
  timestamp). That is if version = N - 1 exists, I want it to be returned.
 

 Have you tried a timerange w/ minStamp of N and maxStamp of
 HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only
 (setMaxVersion(1))?

 St.Ack



Re: Query a version of a column efficiently

2012-07-26 Thread Jerry Lam
Hi St.Ack:

Can you tell me which source code is responsible for the logic. The source code 
in the get and scan doesnt provide an indication of how the setTimeRange works.

Best Regards,

Jerry 

Sent from my iPad (sorry for spelling mistakes)

On 2012-07-26, at 18:30, Stack st...@duboce.net wrote:

 On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam chiling...@gmail.com wrote:
 Hi St.Ack:
 
 Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6,
 10].
 I want to execute an efficient query that returns one version of the column
 that has a timestamp that is equal to 5 or less. So in this case, it should
 return the value of the column A with timestamp = 3.
 
 Using the setTimeRange(5,  Long.MAX_VALUE) with setMaxVersion = 1, my guess
 is that it will return the version 6 not version 3. Correct me if I'm
 wrong.
 
 
 What Tom says, try it.  IIUC, it'll give you your 3.  It won't give
 you 6 since that is outside of the timerange (try 0 instead of
 MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would
 have to check code).
 
 St.Ack


Re: Scanning columns

2012-07-18 Thread Jerry Lam
Hi,

This sounds like you are looking for ColumnRangeFilter?

Best Regards,
Jerry

On Wednesday, July 18, 2012, Mohit Anchlia wrote:

 I am designing a HBase schema as a timeseries model. Taking advice from the
 definitive guide and tsdb I am planning to use my row key as
 metricname:Long.MAX_VALUE - basetimestamp. And the column names would be
 timestamp-base timestamp. My col names would then look like 1,2,3,4,5
 .. for instance. I am looking at Java API to see if I can do a range scan
 of columns, can I say fetch me columns starting at 1 and stop at 4? I see a
 scanner class for row scans but wondering if columns are sorted before
 storing and if I can do a range scan on them too.



Re: WAL corruption

2012-07-02 Thread Jerry Lam
my understanding is that the WAL log is used for replication as well. 
If all your data has been persisted to disk (i.e. all data in memstores have 
been flushed to disks) and replication is disabled, I believe you can delete 
the WAL without data loss.

just my 2 cents 

On 2012-07-02, at 1:37 PM, Bryan Keller wrote:

 During an upgrade of my cluster to 0.90 to 0.92 over the weekend, the WAL 
 (files in the /hbase/.logs directory) was corrupted and it prevented HBase 
 from starting up. The exact exception was java.io.IOException: Could not 
 obtain the last block locations on the WAL files.
 
 I was able to recover by deleting the /hbase/.logs directory. My question is, 
 if HBase had no pending updates, i.e. nothing writing to it, is there any 
 risk of data loss by deleting the WAL directory? For example, does 
 rebalancing, flushing, or compaction use the WAL or is the WAL used only for 
 inserts/updates/deletes?



Re: Recovering corrupt HLog files

2012-06-30 Thread Jerry Lam
This is interesting because I saw this happens in the past. Is walplayer can be 
back ported to 0.90.x? 

Best Regards,

Jerry 

Sent from my iPad

On 2012-06-30, at 16:34, Li Pi l...@idle.li wrote:

 Nope. It came out in 0.94 otoh.
 
 On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault 
 bbeaudrea...@hubspot.com wrote:
 
 I should have mentioned in my initial email that I am operating on HBase
 0.90.4.  Is WALPlayer available in this version?  I am having trouble
 finding it or anything similar.
 
 On Sat, Jun 30, 2012 at 1:14 PM, Li Pi l...@idle.li wrote:
 
 WALPlayer will look at the timestamp. Replaying an older edit that has
 since been overwritten shouldn't change anything.
 
 On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault 
 bbeaudrea...@hubspot.com
 wrote:
 
 They are all pretty large, around 40+mb.  Will the walplayer be smart
 enough to only write edits that still look relevant (i.e. based on
 timestamps of the edits vs timestamps of the versions in hbase)?
 Writes
 have been coming in since we recovered.
 
 On Sat, Jun 30, 2012 at 11:05 AM, Stack st...@duboce.net wrote:
 
 On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault
 bbeaudrea...@hubspot.com wrote:
 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog
 
 
 
 
 hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874.
 Marking as corrupted
 
 
 What size do these logs have?
 
 We are back to stable operating now, and in trying to research
 this I
 found
 the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory.
 There
 are
 20
 files listed there.
 
 
 Ditto.
 
 What are our options for tracking down and potentially recovering
 any
 data
 that was lost.  Or how can we even tell what was lost, if any?
 Does
 the
 existence of these files pretty much guarantee data lost? There
 doesn't
 seem to be much documentation on this.  From reading it seems like
 it
 might
 be possible that part of each of these files was recovered.
 
 
 If size  0, could try walplaying them:
 http://hbase.apache.org/book.html#walplayer
 
 St.Ack
 
 
 
 


Re: direct Hfile Read and Writes

2012-06-27 Thread Jerry Lam
Hi Samar:

I have used IncrementalLoadHFile successfully in the past. Basically, once
you have written hfile youreself you can use the IncrementalLoadHFile to
merge with the HFile currently managed by HBase. Once it is loaded to
HBase, the records in the increment hfile are accessible by clients.

HTH,

Jerry

On Wed, Jun 27, 2012 at 10:33 AM, shixing paradise...@gmail.com wrote:

  1. Since the data we might need would be distributed across regions how
  would direct reading of Hfile be helpful.

 You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader
 and use it to read the HFile.
 Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
 hdfs:///xxx/hfile to print some info to have a look.

  2. Any use-case for direct writes of Hfiles. If we write Hfiles will
  that data be accessible to the hbase shell.

 You can read the HFileOutputFormat, it shows how to create a HFile.Writer
 and use it to directly write kvs the HFile.
 If you want to read the data by hbase shell, you should firstly load the
 HFile to regionservers, details for bulkload
 http://hbase.apache.org/book.html#arch.bulk.load .


 On Wed, Jun 27, 2012 at 6:49 PM, samar kumar samar.opensou...@gmail.com
 wrote:

  Hi Hbase Users,
   I have seen API's supporting HFile direct reads and write. I Do
 understand
  it would create Hfiles in the location specified and it should be much
  faster since we would skip all the look ups to ZK. catalog table . RS ,
 but
  can anyone point me to a particular case when we would like to read/write
  directly .
 
 
1. Since the data we might need would be distributed across regions how
would direct reading of Hfile be helpful.
2. Any use-case for direct writes of Hfiles. If we write Hfiles will
that data be accessible to the hbase shell.
 
 
  Regards,
  Samar
 



 --
 Best wishes!
 My Friend~



Re: HFile Performance

2012-06-25 Thread Jerry Lam
Hi Elliott:

Great! I will look into it ~

Best Regards,

Jerry

On Thu, Jun 21, 2012 at 6:24 PM, Elliott Clark ecl...@stumbleupon.comwrote:

 HFilePerformanceEvaluation is in the source tree hbase-server/src/test.  I
 haven't played with it myself but it might help you.

 On Thu, Jun 21, 2012 at 3:13 PM, Jerry Lam chiling...@gmail.com wrote:

  Hi HBase guru,
 
  I would like to benchmark HFile performance without other components in
  HBase. I know that I can use HFile as any other file format in Hadoop
 IO. I
  wonder if there is a HFile benchmark available so I don't end up
  reinventing the wheel.
 
  Best Regards,
 
  Jerry
 



Re: Hbase Replication

2012-06-22 Thread Jerry Lam
Hi Mohammad:

The current HBase replication (as far as I understand) is designed to
replicate data from one data center to another data center. The client
application has no knowledge of the zookeeper ensemble in the slave cluster
and therefore, it cannot switch to the slave cluster in case of a system
failure. You might want to implement a smart client can switch to the slave
cluster when it detects a failure. Be aware of the data might not
immediately available to the slave cluster (i.e. the master cluster might
have the most up-to-date data than the slave cluster even you configured
the replication)

HTH,

Jerry

On Fri, Jun 22, 2012 at 5:18 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello list,

  I was going through the Hbase Replication documentation(at
 http://hbase.apache.org/replication.html) to get myself clear with the
 concepts..One thing which I could not find is that whether it is
 possible to configure the replication in such a way that if my master
 cluster goes down the slave cluster will automatically take its
 place..Need some advice/comments from the experts.Many thanks.

 Regards,
 Mohammad Tariq



HFile Performance

2012-06-21 Thread Jerry Lam
Hi HBase guru,

I would like to benchmark HFile performance without other components in
HBase. I know that I can use HFile as any other file format in Hadoop IO. I
wonder if there is a HFile benchmark available so I don't end up
reinventing the wheel.

Best Regards,

Jerry


Re: Isolation level

2012-06-15 Thread Jerry Lam
Hi Cristina:

My understanding of HBase is that the isolation level fro read ops is Read
Committed. There is only write lock which could protect the data from
modifying by other requests but there is no read-lock (it is there but it
doesn't have any effect). Since put ops are atomic, it can succeed or fail
but not in the middle so clients can only read the data if the write ops
succeeds

HTH,

Jerry


On Fri, Jun 15, 2012 at 7:59 AM, Cristina cristi_...@hotmail.com wrote:

 Hi,

 I have read that Hbase has read committed as isolation level, but I have
 some
 doubts.
 Is it possible to chage this level, for instance to read uncommitted? How
 could
 I do this?
 Another question, Is this isolation level based on locks? I have doubts
 because
 Hbase has multiversion concurrency control so it may implement read
 committed
 snapshot or snapshot isolation.

 Thanks,
Cristina




Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Jerry Lam
Hi Himanshu:

Thanks for following up! I did looked up the log and there were some 
exceptions. I'm not sure if those exceptions contribute to the problem I've 
seen a week ago.
I did aware of the latency between the time that the master said Nothing to 
replicate and the actual time it takes to actually replicate on the slave. I 
remember I wait 12 hours for the replication to finish (i.e. start the test 
before leaving office and check the result the next day) and data still not 
fully replicated. 

By the way, is your test running with master-slave replication or master-master 
replication?

I will resume this again. I was busy on something else for the past week or so. 

Best Regards,

Jerry

On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote:

 Hello Jerry,
 
 Did you try this again.
 
 Whenever you try next, can you please share the logs somehow.
 
 I tried replicating your scenario today, but no luck. I used the same
 workload you have copied here; master cluster has 5 nodes and slave
 has just 2 nodes; and made tiny regions of 8MB (memstore flushing at
 8mb too), so that I have around 1200+ regions even for 200k rows; ran
 the workload with 16, 24 and 32 client threads, but the verifyrep
 mapreduce job says its good.
 Yes, I ran the verifyrep command after seeing there is nothing to
 replicate message on all the regionservers; sometimes it was a bit
 slow.
 
 
 Thanks,
 Himanshu
 
 On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans
 jdcry...@apache.org wrote:
 I will try your suggestion today with a master-slave replication enabled 
 from Cluster A - Cluster B.
 
 Please do.
 
 Last Friday, I tried to limit the variability/the moving part of the 
 replication components. I reduced the size of Cluster B to have only 1 
 regionserver and having Cluster A to replicate data from one region only 
 without region splitting (therefore I have 1-to-1 region replication 
 setup). During the benchmark, I moved the region between different 
 regionservers in Cluster A (note there are still 3 regionservers in Cluster 
 A). I ran this test for 5 times and no data were lost. Does it mean 
 something? My feeling is there are some glitches/corner cases that have not 
 been covered in the cyclic replication (or hbase replication in general). 
 Note that, this happens only when the load is high.
 
 And have you looked at the logs? Any obvious exceptions coming up?
 Replication uses the normal HBase client to insert the data on the
 other cluster and this is what handles regions moving around.
 
 
 By the way, why do we need to have a zookeeper not handled by hbase for the 
 replication to work (it is described in the hbase documentation)?
 
 It says you *should* do it, not you *need* to do it :)
 
 But basically replication is zk-heavy and getting a better
 understanding of it starts with handling it yourself.
 
 J-D



Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Jerry Lam
Hi Himanshu:

My team is particularly interested in the cyclic replication so I have enable 
the master-master replication (so each cluster has the other cluster as its 
replication peer), although the replication was one direction (from cluster A 
to cluster B) in the test. I didn't stop_replication on the other cluster if 
that is what you mean by disabling the replication.

Thanks!

Jerry

On 2012-05-01, at 10:08 PM, Himanshu Vashishtha wrote:

 Yeah, I should have mentioned that: its master-master, and on cdh4b1.
 But, replication on that specific slave table is disabled (so,
 effectively its master-slave for this test).
 
 Is this same as yours (replication config wise), or shall I enable
 replication on the destination table too?
 
 Thanks,
 Himanshu
 
 On Tue, May 1, 2012 at 8:01 PM, Jerry Lam chiling...@gmail.com wrote:
 Hi Himanshu:
 
 Thanks for following up! I did looked up the log and there were some 
 exceptions. I'm not sure if those exceptions contribute to the problem I've 
 seen a week ago.
 I did aware of the latency between the time that the master said Nothing to 
 replicate and the actual time it takes to actually replicate on the slave. 
 I remember I wait 12 hours for the replication to finish (i.e. start the 
 test before leaving office and check the result the next day) and data still 
 not fully replicated.
 
 By the way, is your test running with master-slave replication or 
 master-master replication?
 
 I will resume this again. I was busy on something else for the past week or 
 so.
 
 Best Regards,
 
 Jerry
 
 On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote:
 
 Hello Jerry,
 
 Did you try this again.
 
 Whenever you try next, can you please share the logs somehow.
 
 I tried replicating your scenario today, but no luck. I used the same
 workload you have copied here; master cluster has 5 nodes and slave
 has just 2 nodes; and made tiny regions of 8MB (memstore flushing at
 8mb too), so that I have around 1200+ regions even for 200k rows; ran
 the workload with 16, 24 and 32 client threads, but the verifyrep
 mapreduce job says its good.
 Yes, I ran the verifyrep command after seeing there is nothing to
 replicate message on all the regionservers; sometimes it was a bit
 slow.
 
 
 Thanks,
 Himanshu
 
 On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans
 jdcry...@apache.org wrote:
 I will try your suggestion today with a master-slave replication enabled 
 from Cluster A - Cluster B.
 
 Please do.
 
 Last Friday, I tried to limit the variability/the moving part of the 
 replication components. I reduced the size of Cluster B to have only 1 
 regionserver and having Cluster A to replicate data from one region only 
 without region splitting (therefore I have 1-to-1 region replication 
 setup). During the benchmark, I moved the region between different 
 regionservers in Cluster A (note there are still 3 regionservers in 
 Cluster A). I ran this test for 5 times and no data were lost. Does it 
 mean something? My feeling is there are some glitches/corner cases that 
 have not been covered in the cyclic replication (or hbase replication in 
 general). Note that, this happens only when the load is high.
 
 And have you looked at the logs? Any obvious exceptions coming up?
 Replication uses the normal HBase client to insert the data on the
 other cluster and this is what handles regions moving around.
 
 
 By the way, why do we need to have a zookeeper not handled by hbase for 
 the replication to work (it is described in the hbase documentation)?
 
 It says you *should* do it, not you *need* to do it :)
 
 But basically replication is zk-heavy and getting a better
 understanding of it starts with handling it yourself.
 
 J-D
 



Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-04-23 Thread Jerry Lam
Hi Lars:

I will try your suggestion today with a master-slave replication enabled from 
Cluster A - Cluster B. 
Last Friday, I tried to limit the variability/the moving part of the 
replication components. I reduced the size of Cluster B to have only 1 
regionserver and having Cluster A to replicate data from one region only 
without region splitting (therefore I have 1-to-1 region replication setup). 
During the benchmark, I moved the region between different regionservers in 
Cluster A (note there are still 3 regionservers in Cluster A). I ran this test 
for 5 times and no data were lost. Does it mean something? My feeling is there 
are some glitches/corner cases that have not been covered in the cyclic 
replication (or hbase replication in general). Note that, this happens only 
when the load is high. 

By the way, why do we need to have a zookeeper not handled by hbase for the 
replication to work (it is described in the hbase documentation)?

Best Regards,

Jerry

On 2012-04-20, at 7:08 PM, lars hofhansl wrote:

 I see.
 Does this only happen when cyclic replication is enabled in this way (i.e. 
 master - master replication).
 The replication back does take some overhead as the replicator needs to 
 filter edits from being replication back to the originator, but I would not 
 have thought that would cause any issues.
 
 Could you run the same test once with replication only enabled from ClusterA 
 - ClusterB?
 
 Thanks.
 
 
 -- Lars
 
 
 
 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: 
 Sent: Friday, April 20, 2012 3:43 PM
 Subject: Re: HBase Cyclic Replication Issue: some data are missing in the 
 replication for intensive write
 
 Hi Himanshu:
 
 I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and 
 Hadoop 0.20 with append feature.
 
 It is a one side replication (cluster A to cluster B) with cyclic replication 
 enabled (i.e. add_peer of the other cluster configured). 
 
 Best Regards,
 
 Jerry
 
 Sent from my iPad
 
 On 2012-04-20, at 10:23, Himanshu Vashishtha hvash...@cs.ualberta.ca wrote:
 
 Hello Jerry,
 
 Which HBase version?
 
 You are not using cyclic replication? Its simple one side replication, 
 right?
 
 Thanks,
 Himanshu
 
 On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam chiling...@gmail.com wrote:
 Hi HBase community:
 
 We have been testing cyclic replication for 1 week. The basic functionality 
 seems to work as described in the document however when we started to 
 increase the write workload, the replication starts to miss data (i.e. some 
 data are not replicated to the other cluster). We have narrowed down to a 
 scenario that we can reproduce the problem quite consistently and here it 
 is:
 
 -
 Setup:
 - We have setup 2 clusters (cluster A and cluster B)with identical size in 
 terms of number of nodes and configuration, 3 regionservers sit on top of 3 
 datanodes.
 - Cyclic replication is enabled.
 
 - We use YCSB to generate load to hbase the workload is very similar to 
 workloada:
 
 recordcount=20
 operationcount=20
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 fieldcount=1
 fieldlength=25000
 
 readallfields=true
 writeallfields=true
 
 readproportion=0
 updateproportion=1
 scanproportion=0
 insertproportion=0
 
 requestdistribution=uniform
 
 - Records are inserted into Cluster A. After the benchmark is done and wait 
 until all data are replicated to Cluster B, we used verifyrep mapreduce job 
 for validation.
 - Data are deleted from both table (truncate 'tablename') before a new 
 experiment is started.
 
 Scenario:
 when we increase the number of threads until it max out the throughput of 
 the cluster, we saw some data are missing in Cluster B (total count != 
 20) although cluster A clearly has them all. This happens even though 
 we disabled region splitting in both clusters (it happens more often when 
 region splits occur). To further having more control of what is happening, 
 we then decided to disable the load balancer so the region (which is 
 responsible for the replicating data) will not relocate to other 
 regionserver during the benchmark. The situation improves a lot. We don't 
 see any missing data in 5 continuous runs. Finally, we decided to move the 
 region around from a regionserver to another regionserver during the 
 benchmark to see if the problem will reappear and it did.
 
 We believe that the issue could be related to region splitting and load 
 balancing during intensive write, the hbase replication strategy hasn't yet 
 cover those corner cases.
 
 Can someone take a look of it and suggest some ways to workaround this?
 
 Thanks~
 
 Jerry
 



HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-04-20 Thread Jerry Lam
Hi HBase community:

We have been testing cyclic replication for 1 week. The basic functionality 
seems to work as described in the document however when we started to increase 
the write workload, the replication starts to miss data (i.e. some data are not 
replicated to the other cluster). We have narrowed down to a scenario that we 
can reproduce the problem quite consistently and here it is:

-
Setup:
- We have setup 2 clusters (cluster A and cluster B)with identical size in 
terms of number of nodes and configuration, 3 regionservers sit on top of 3 
datanodes. 
- Cyclic replication is enabled.

- We use YCSB to generate load to hbase the workload is very similar to 
workloada:

recordcount=20
operationcount=20
workload=com.yahoo.ycsb.workloads.CoreWorkload
fieldcount=1
fieldlength=25000
 
readallfields=true
writeallfields=true
 
readproportion=0
updateproportion=1
scanproportion=0
insertproportion=0
 
requestdistribution=uniform
 
- Records are inserted into Cluster A. After the benchmark is done and wait 
until all data are replicated to Cluster B, we used verifyrep mapreduce job for 
validation.
- Data are deleted from both table (truncate 'tablename') before a new 
experiment is started.

Scenario:
when we increase the number of threads until it max out the throughput of the 
cluster, we saw some data are missing in Cluster B (total count != 20) 
although cluster A clearly has them all. This happens even though we disabled 
region splitting in both clusters (it happens more often when region splits 
occur). To further having more control of what is happening, we then decided to 
disable the load balancer so the region (which is responsible for the 
replicating data) will not relocate to other regionserver during the benchmark. 
The situation improves a lot. We don't see any missing data in 5 continuous 
runs. Finally, we decided to move the region around from a regionserver to 
another regionserver during the benchmark to see if the problem will reappear 
and it did. 

We believe that the issue could be related to region splitting and load 
balancing during intensive write, the hbase replication strategy hasn't yet 
cover those corner cases. 

Can someone take a look of it and suggest some ways to workaround this? 

Thanks~

Jerry

Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-04-20 Thread Jerry Lam
Hi Himanshu:

I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 
0.20 with append feature.

It is a one side replication (cluster A to cluster B) with cyclic replication 
enabled (i.e. add_peer of the other cluster configured). 

Best Regards,

Jerry

Sent from my iPad

On 2012-04-20, at 10:23, Himanshu Vashishtha hvash...@cs.ualberta.ca wrote:

 Hello Jerry,
 
 Which HBase version?
 
 You are not using cyclic replication? Its simple one side replication, 
 right?
 
 Thanks,
 Himanshu
 
 On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam chiling...@gmail.com wrote:
 Hi HBase community:
 
 We have been testing cyclic replication for 1 week. The basic functionality 
 seems to work as described in the document however when we started to 
 increase the write workload, the replication starts to miss data (i.e. some 
 data are not replicated to the other cluster). We have narrowed down to a 
 scenario that we can reproduce the problem quite consistently and here it is:
 
 -
 Setup:
 - We have setup 2 clusters (cluster A and cluster B)with identical size in 
 terms of number of nodes and configuration, 3 regionservers sit on top of 3 
 datanodes.
 - Cyclic replication is enabled.
 
 - We use YCSB to generate load to hbase the workload is very similar to 
 workloada:
 
 recordcount=20
 operationcount=20
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 fieldcount=1
 fieldlength=25000
 
 readallfields=true
 writeallfields=true
 
 readproportion=0
 updateproportion=1
 scanproportion=0
 insertproportion=0
 
 requestdistribution=uniform
 
 - Records are inserted into Cluster A. After the benchmark is done and wait 
 until all data are replicated to Cluster B, we used verifyrep mapreduce job 
 for validation.
 - Data are deleted from both table (truncate 'tablename') before a new 
 experiment is started.
 
 Scenario:
 when we increase the number of threads until it max out the throughput of 
 the cluster, we saw some data are missing in Cluster B (total count != 
 20) although cluster A clearly has them all. This happens even though we 
 disabled region splitting in both clusters (it happens more often when 
 region splits occur). To further having more control of what is happening, 
 we then decided to disable the load balancer so the region (which is 
 responsible for the replicating data) will not relocate to other 
 regionserver during the benchmark. The situation improves a lot. We don't 
 see any missing data in 5 continuous runs. Finally, we decided to move the 
 region around from a regionserver to another regionserver during the 
 benchmark to see if the problem will reappear and it did.
 
 We believe that the issue could be related to region splitting and load 
 balancing during intensive write, the hbase replication strategy hasn't yet 
 cover those corner cases.
 
 Can someone take a look of it and suggest some ways to workaround this?
 
 Thanks~
 
 Jerry


Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-04-20 Thread Jerry Lam
Hi Lars:

I'm using hbase 0.92.1 and Hadoop 1.0.1.
Yes, you are right. I'm replicating from cluster A to cluster B only with 
cyclic replication configured. Eventually I will test replicating cluster A to 
cluster B and vice versa with high intensive write workload but if this 
replication doesn't work for one way, we need to think about other solutions. 

No data loss in cluster A for sure. 

Best Regards,

Jerry 

Sent from my iPad

On 2012-04-20, at 15:34, lars hofhansl lhofha...@yahoo.com wrote:

 Hi Jerry,
 
 which version of HBase are you using?
 
 You are not using cyclic backup, that needs 2 clusters. I assume you're just 
 replicating from one cluster to another, right?
 
 There is never data loss in Cluster A?
 
 -- Lars
 
 
 - Original Message -
 From: Jerry Lam chiling...@gmail.com
 To: user@hbase.apache.org
 Cc: 
 Sent: Friday, April 20, 2012 5:38 AM
 Subject: HBase Cyclic Replication Issue: some data are missing in the 
 replication for intensive write
 
 Hi HBase community:
 
 We have been testing cyclic replication for 1 week. The basic functionality 
 seems to work as described in the document however when we started to 
 increase the write workload, the replication starts to miss data (i.e. some 
 data are not replicated to the other cluster). We have narrowed down to a 
 scenario that we can reproduce the problem quite consistently and here it is:
 
 -
 Setup:
 - We have setup 2 clusters (cluster A and cluster B)with identical size in 
 terms of number of nodes and configuration, 3 regionservers sit on top of 3 
 datanodes. 
 - Cyclic replication is enabled.
 
 - We use YCSB to generate load to hbase the workload is very similar to 
 workloada:
 
 recordcount=20
 operationcount=20
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 fieldcount=1
 fieldlength=25000
 
 readallfields=true
 writeallfields=true
 
 readproportion=0
 updateproportion=1
 scanproportion=0
 insertproportion=0
 
 requestdistribution=uniform
 
 - Records are inserted into Cluster A. After the benchmark is done and wait 
 until all data are replicated to Cluster B, we used verifyrep mapreduce job 
 for validation.
 - Data are deleted from both table (truncate 'tablename') before a new 
 experiment is started.
 
 Scenario:
 when we increase the number of threads until it max out the throughput of the 
 cluster, we saw some data are missing in Cluster B (total count != 20) 
 although cluster A clearly has them all. This happens even though we disabled 
 region splitting in both clusters (it happens more often when region splits 
 occur). To further having more control of what is happening, we then decided 
 to disable the load balancer so the region (which is responsible for the 
 replicating data) will not relocate to other regionserver during the 
 benchmark. The situation improves a lot. We don't see any missing data in 5 
 continuous runs. Finally, we decided to move the region around from a 
 regionserver to another regionserver during the benchmark to see if the 
 problem will reappear and it did. 
 
 We believe that the issue could be related to region splitting and load 
 balancing during intensive write, the hbase replication strategy hasn't yet 
 cover those corner cases. 
 
 Can someone take a look of it and suggest some ways to workaround this? 
 
 Thanks~
 
 Jerry