Re: Slow scanning for PrefixFilter on EncodedBlocks

2012-10-18 Thread J Mohamed Zahoor
+1 for making PrefixFIlter seek instead of using a startRow explicitly.

./zahoor

On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl lhofha...@yahoo.com wrote:

 Oh yeah, I meant that one should always set the startrow as a matter of
 practice - if possible - and never rely on the filter alone.



 
  From: anil gupta anilgupt...@gmail.com
 To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
 Sent: Wednesday, October 17, 2012 12:25 PM
 Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks


 Hi Lars,

 There is a specific use case for this:

 Table: Suppose i have a rowkey:customer_idevent_timestampuid

 Use case: I would like to get all the events of customer_id=123.
 Case 1: If i only use startRow=123 then i will get events of  other
 customers having customers_id  123 since the scanner will be keep on
 fetching rows until the end of table.
 Case 2: If i use prefixFilter=123 and startRow=123 then i will get the
 correct result.

 IMHO, adding the feature of smartly adding the startRow in PrefixFilter
 wont hurt any existing functionality. Use of StartRow and PrefixFilter will
 still be different.

 Thanks,
 Anil Gupta



 On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl lhofha...@yahoo.com
 wrote:

 That is a good point. There is no reason why prefix filter cannot issue a
 seek to the first KV for that prefix.
 Although it lead to a practice where people would the prefix filter when
 they in fact should just set the start row.
 
 
 
 
 
 - Original Message -
 From: anil gupta anilgupt...@gmail.com
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, October 17, 2012 9:41 AM
 Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
 
 Hi Zahoor,
 
 I heavily use prefix filter. Every time i have to explicitly define the
 startRow. So, that's the current behavior. However, initially this
 behavior
 was confusing to me also.
 I think that when a Prefix filter is defined then internally the
 startRow=prefix can be set. User defined StartRow takes precedence over
 the
 prefixFilter startRow. If the current prefixFilter can be modified in that
 way then it will eradicate this confusion regarding performance of prefix
 filter.
 
 Thanks,
 Anil Gupta
 
 On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor jmo...@gmail.com
 wrote:
 
  First i upgraded my cluster to 94.2.. even then the problem persisted..
  Then i moved to using startRow instead of prefix filter..
 
 
  ,/zahoor
 
  On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor jmo...@gmail.com
  wrote:
 
   Sorry for the delay.
  
   It looks like the problem is because of PrefixFilter...
   I assumed that i does a seek...
  
   If i use startRow instead.. it works fine.. But is it the correct
  approach?
  
   ./zahoor
  
  
   On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl lhofha...@yahoo.com
  wrote:
  
   I reopened HBASE-6577
  
  
  
   - Original Message -
   From: lars hofhansl lhofha...@yahoo.com
   To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl 
   lhofha...@yahoo.com
   Cc:
   Sent: Tuesday, October 16, 2012 2:39 PM
   Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
  
   Looks like this is exactly the scenario I was trying to optimize with
   HBASE-6577. Hmm...
   
   From: lars hofhansl lhofha...@yahoo.com
   To: user@hbase.apache.org user@hbase.apache.org
   Sent: Tuesday, October 16, 2012 12:21 AM
   Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
  
   PrefixFilter does not do any seeking by itself, so I doubt this is
   related to HBASE-6757.
   Does this only happen with FAST_DIFF compression?
  
  
   If you can create an isolated test program (that sets up the scenario
  and
   then runs a scan with the filter such that it is very slow), I'm
 happy
  to
   take a look.
  
   -- Lars
  
  
  
   - Original Message -
   From: J Mohamed Zahoor jmo...@gmail.com
   To: user@hbase.apache.org user@hbase.apache.org
   Cc:
   Sent: Monday, October 15, 2012 10:27 AM
   Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
  
   Is this related to HBASE-6757 ?
   I use a filter list with
 - prefix filter
 - filter list of column filters
  
   /zahoor
  
   On Monday, October 15, 2012, J Mohamed Zahoor wrote:
  
Hi
   
My scanner performance is very slow when using a Prefix filter on a
**Encoded Column** ( encoded using FAST_DIFF on both memory and
 disk).
I am using 94.1 hbase.
   
jstack shows that much time is spent on seeking the row.
Even if i give a exact row key match in the prefix filter it takes
  about
two minutes to return a single row.
Running this multiple times also seems to be redirecting things to
  disk
(loadBlock).
   
   
at
   
  
 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
at
   
  
 
 

Re: crafting your key - scan vs. get

2012-10-18 Thread Michael Segel
Neil, 

I've pointed you in the right direction. 
The rest of the exercise is left to the student. :-) 

While you used the comment about having fun, your question is boring. *^1
The fun part is for you now to play and see why I may have suggested the 
importance of column order.

Sorry, but that really is the fun part of your question... figuring out the 
rest of the answer on your own. 

From your response, you clearly understand it, but you need to spend more time 
wrapping your head around the solution and taking ownership of it. 

Have fun, 

-Mike


*^1  The reason I say that the question is boring is that once you fully 
understand the problem and the solution, you can easily apply it to other 
problems. The fun is in actually taking the time to experiment and work through 
the problem on your own. Seriously, that *is* the fun part.


On Oct 17, 2012, at 10:53 PM, Neil Yalowitz neilyalow...@gmail.com wrote:

 This is a helpful response, thanks.  Our use case fits the Show me the
 most recent events by user A you described.
 
 So using the first example, a table populated with events of user ID AA.
 
 ROWCOLUMN+CELL
 
 
 AA
 column=data:event, timestamp=1350420705459, value=myeventval1
 
 
 AA
 column=data:event9998, timestamp=1350420704490, value=myeventval2
 
 
 AA
 column=data:event9997, timestamp=1350420704567, value=myeventval3
 
 NOTE1: I replaced the TS stuff with ...9997 for brevity, and the
 example user ID AA would actually be hashed to avoid hotspotting
 NOTE2: I assume I should shorten the chosen column family and qualifier
 before writing it to a large production table (for instance, d instead of
 data and e instead of event)
 
 I hope I have that right.  Thanks for the response!
 
 As for including enough description for the question to be not-boring,
 I'm never quite sure when an email will grow so long that no one will read
 it.  :)  So to give more background: Each event is about 1KB of data.  The
 frequency is highly variable... over any given period of time, some users
 may only log one event and no more, some users may log a few events (10 to
 100), in some rare cases a user may log many events (1000+).  The width of
 the column is some concern for the users with many events, but I'm thinking
 a few rare rows with 1KB x 1000+ width shouldn't kill us.
 
 If I may ask a couple of followup question about your comments:
 
 Then store each event in a separate column where the column name is
 something like event + (max Long - Time Stamp) .
 
 This will place the most recent event first.
 
 Although I know row keys are sorted, I'm not sure what this means for a
 qualifier.  The scan result can depend on what cf:qual is used?  ...and
 that determines which column value is first?  Is this related to using
 setMaxResultsPerColumnFamily(1)?  (ie-- only return one column value, so
 sort on qualifier and return the first val found)
 
 The reason I say event + the long, is that you may want to place user
 specific information in a column and you would want to make sure it was in
 front of the event data.
 
 Same question as above, I'm not sure what would place a column in front.
 Am I missing something?
 
 In the first case, you can use get() while still a scan, its a very
 efficient fetch.
 In the second, you will always need to do a scan.
 
 This is the core of my original question.  My anecdotal tests in hbase
 shell showed a Get executing about 3x faster than a Scan with
 start/stoprow, but I don't trust my crude testing much and hoped someone
 could describe the performance trade-off between Scan vs. Get.
 
 
 Thanks again for anyone who read this far.
 
 
 Neil Yalowitz
 neilyalow...@gmail.com
 
 On Wed, Oct 17, 2012 at 10:45 AM, Michael Segel
 michael_se...@hotmail.comwrote:
 
 Neil,
 
 
 Since you asked
 Actually your question is kind of a boring question. ;-) [Note I will
 probably get flamed for saying it, even if it is the truth!]
 
 Having said that...
 Boring as it is, its an important topic that many still seem to trivialize
 in terms of its impact on performance.
 
 Before answering your question, lets take a step back and ask a more
 important question...
 What data do you to capture and store in HBase?
 and then ask yourself...
 How do I plan on accessing the data?
 
 From what I can tell, you want to track certain events made by a user.
 So you're recording at Time X, user A did something.
 
 Then the question is how do you want to access the data.
 
 Do you primarily say Show me all the events in the past 15 minutes and
 organize them by user?
 Or do you say Show me the most recent events by user A ?
 
 Here's the issue.
 
 If you are more interested and will frequently ask the question of Show
 me the most recent events by user A,
 
 Then you would want to do the following:
 Key = User ID (hashed if necessary)
 Column Family: Data (For lack of a better name)
 
 Then store each event in a 

RE: Checking major compaction

2012-10-18 Thread Ramkrishna.S.Vasudevan
Hi 

Yes Kiran you can go thro the logs also.  

You will see some logs like 
'Start major compaction for ..
'Compacting file  
'Compacting file 
And finally 'Completed major/minor compaction.'

I just don have some exact logs with me right now.  But you can see log msgs
but all comes in debug mode.  So ensure you enable debug mode for your logs.

A simple test would be to just right some 10 rows. In between do some 4 to 5
flushes.

Just give major_compact(tableName) from the shell.   You can see the logs.
:)

Regards
Ram

 -Original Message-
 From: kiran [mailto:kiran.sarvabho...@gmail.com]
 Sent: Thursday, October 18, 2012 12:03 PM
 To: user@hbase.apache.org
 Subject: Re: Checking major compaction
 
 Thanks ram,
 
 Is there a way can I check it through region server logs. If it is
 possible
 what are the statements that I need to look for ??
 
 Thanks
 Kiran
 
 On Thu, Oct 18, 2012 at 11:55 AM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:
 
  HBASE-6033 does the work that you ask for.  It is currently in Trunk
  version
  of HBase.
 
  Regards
  Ram
 
   -Original Message-
   From: kiran [mailto:kiran.sarvabho...@gmail.com]
   Sent: Thursday, October 18, 2012 11:43 AM
   To: user@hbase.apache.org
   Subject: Checking major compaction
  
   Hi all,
  
   Is there a way to check if major compaction is running or not on a
   table.
  
   --
   Thank you
   Kiran Sarvabhotla
  
   -Even a correct decision is wrong when it is taken late
 
 
 
 
 --
 Thank you
 Kiran Sarvabhotla
 
 -Even a correct decision is wrong when it is taken late



Re: hbase deployment using VMs for data nodes and SAN for data storage

2012-10-18 Thread Michael Segel
Lars, 

I think we need to clarify what we think of as a  SAN. 
Its possible to have a SAN where the disks appear as attached storage, while 
the traditional view is that the disks are detached. 

There are some design considerations like cluster density where one would want 
to use a SAN like NetApp to effectively create a storage half to a cluster and 
then a compute half that requires a fraction of the space and energy of a 
commodity built cluster. 

When we start to see clusters at PB scale, we have to consider the size of the 
footprint and the cost of operating them in terms of both energy efficiency and 
physical footprint in a data center. 

HBase can run in such configurations with the right tuning. 

I for one would love to have a data center where I can drop in different 
configurations and be able to tune and validate cluster designs, but alas 
that's something only a MapR, Cloudera, Hortonworks thing where they have the 
deep pockets and necessity to actually work through this for their customers. 


On Oct 15, 2012, at 11:43 PM, lars hofhansl lhofha...@yahoo.com wrote:

 If you have a SAN, why would you want to use HBase?
 
 -- Lars
 
 
 From: Pamecha, Abhishek apame...@x.com
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Monday, October 15, 2012 3:00 PM
 Subject: hbase deployment using VMs for data nodes and SAN for data storage
 
 Hi
 
 We are deciding between using local disks for bare metal hosts Vs VMs using 
 SAN for data storage. I was wondering if anyone has contrasted performance, 
 availability and scalability between these two options?
 
 IMO, This is kinda similar to a typical  AWS or another cloud deployment.
 
 Thanks,
 Abhishek
 



RE: Checking major compaction

2012-10-18 Thread Ramkrishna.S.Vasudevan
A simple test would be to just right some 10 rows
I meant to say write some 10 rows.(not right)

Regards
Ram

 -Original Message-
 From: Ramkrishna.S.Vasudevan [mailto:ramkrishna.vasude...@huawei.com]
 Sent: Thursday, October 18, 2012 2:05 PM
 To: user@hbase.apache.org
 Subject: RE: Checking major compaction
 
 Hi
 
 Yes Kiran you can go thro the logs also.
 
 You will see some logs like
 'Start major compaction for ..
 'Compacting file  
 'Compacting file 
 And finally 'Completed major/minor compaction.'
 
 I just don have some exact logs with me right now.  But you can see log
 msgs
 but all comes in debug mode.  So ensure you enable debug mode for your
 logs.
 
 A simple test would be to just right some 10 rows. In between do some 4
 to 5
 flushes.
 
 Just give major_compact(tableName) from the shell.   You can see the
 logs.
 :)
 
 Regards
 Ram
 
  -Original Message-
  From: kiran [mailto:kiran.sarvabho...@gmail.com]
  Sent: Thursday, October 18, 2012 12:03 PM
  To: user@hbase.apache.org
  Subject: Re: Checking major compaction
 
  Thanks ram,
 
  Is there a way can I check it through region server logs. If it is
  possible
  what are the statements that I need to look for ??
 
  Thanks
  Kiran
 
  On Thu, Oct 18, 2012 at 11:55 AM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   HBASE-6033 does the work that you ask for.  It is currently in
 Trunk
   version
   of HBase.
  
   Regards
   Ram
  
-Original Message-
From: kiran [mailto:kiran.sarvabho...@gmail.com]
Sent: Thursday, October 18, 2012 11:43 AM
To: user@hbase.apache.org
Subject: Checking major compaction
   
Hi all,
   
Is there a way to check if major compaction is running or not on
 a
table.
   
--
Thank you
Kiran Sarvabhotla
   
-Even a correct decision is wrong when it is taken late
  
  
 
 
  --
  Thank you
  Kiran Sarvabhotla
 
  -Even a correct decision is wrong when it is taken late



one RegionServer crashed and the whole cluster was blocked

2012-10-18 Thread 张磊
Hi, All

  One of the RegionServer of our company’s cluster was crashed. At this
time, I found:

1.   All the RegionServer stopped handling the requests from the client
side( requestsPerSecond=0 at the master-status UI page).

2.   It takes about 12-15 minutes to recovery.

3.   I have set hbase.regionserver.restart.on.zk.expire to true, but it
does not work.

  For 1, I knew the cluster began to split log and recover the data on the
crashed RegionServer, will the recovery operation block all the requests
from the client side?

  For 2, Is there any solution to reduce the recovery time?

  For 3, I checked the log, found “session is timeout” exception, maybe
for full gc and the session was timeout. But why the
hbase.regionserver.restart.on.zk.expire does not work? My HBase version is
0.94.0.

 

  Thanks for any suggestions and feedback!

 

Fowler Zhang

 



Re: Comparison of hbase/hadoop with sql server

2012-10-18 Thread Harsh J
What is the difference between HBase and Hadoop+HBase? HBase runs on
top of Hadoop components.

Also, first answer us this question, before we answer yours: Will your
SQL Server scale linearly as you add more machines? Can it easily
scale horizontally and vertically?

Seems to me like you're comparing the wrong elements in deciding what
platform to base your application on. If you could explain what you
wish to do, and what data sizes you expect to work with, we can
provide a better answer.

On Thu, Oct 18, 2012 at 5:06 PM, iwannaplay games
funnlearnfork...@gmail.com wrote:
 Hi,

 Can anyone give the clear idea about these comparisons on same hardware 
 software configuration.

  Sql server
 hbasehadoop+hbase
 data compression ?
 ??  (yes/no,if all yes
 where it is more effective)
 Online back ups?
 ?   ?
 Security   ?
 ? ?   (which is more secure and
 more controllable)
 Batch Queries execution time?
 ? ?   (where time consumption
 will b more for aggregates)


 Let me know if i need to consider any benefit of hadoop/hbase over sql
 server

 Thanks  Regards
 Prabhjot



-- 
Harsh J


RE: one RegionServer crashed and the whole cluster was blocked

2012-10-18 Thread Ramkrishna.S.Vasudevan
   For 1, I knew the cluster began to split log and recover the data on
 the
 crashed RegionServer, will the recovery operation block all the
 requests
 from the client side?


Ideally should not.  But if your client was generating data for the regions
that were dead at that time then client requests willnot be served till the
regions are online after
Log splitting on some other region server.
Any client requests going to other region servers should ideally be working.
Did you see the threaddumps at that time on the other RS? That should give
some clue.

   For 2, Is there any solution to reduce the recovery time?
The recovery time depends on the amount of data and particularly on the size
of the HLog file.  By default every HLog file is of size 256MB.
In 0.94.0 some good no of changes have gone in to make the recovery faster
in terms of HLog Splitting.


 3.   I have set hbase.regionserver.restart.on.zk.expire to true,
 but it
 does not work.
I am not very sure how the code works with this property.  Will check this
part.

Regards
Ram



 -Original Message-
 From: 张磊 [mailto:zhang...@youku.com]
 Sent: Thursday, October 18, 2012 5:01 PM
 To: user@hbase.apache.org
 Subject: one RegionServer crashed and the whole cluster was blocked
 
 Hi, All
 
   One of the RegionServer of our company’s cluster was crashed. At this
 time, I found:
 
 1.   All the RegionServer stopped handling the requests from the
 client
 side( requestsPerSecond=0 at the master-status UI page).
 
 2.   It takes about 12-15 minutes to recovery.
 
 3.   I have set hbase.regionserver.restart.on.zk.expire to true,
 but it
 does not work.
 
   For 1, I knew the cluster began to split log and recover the data on
 the
 crashed RegionServer, will the recovery operation block all the
 requests
 from the client side?
 
   For 2, Is there any solution to reduce the recovery time?
 
   For 3, I checked the log, found “session is timeout” exception, maybe
 for full gc and the session was timeout. But why the
 hbase.regionserver.restart.on.zk.expire does not work? My HBase version
 is
 0.94.0.
 
 
 
   Thanks for any suggestions and feedback!
 
 
 
 Fowler Zhang
 
 




Re: Coprocessor end point vs MapReduce?

2012-10-18 Thread Doug Meil

To echo what Mike said about KISS, would you use triggers for a large
time-sensitive batch job in an RDBMS?  It's possible, but probably not.
Then you might want to think twice about using co-processors for such a
purpose with HBase.





On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote:

Run your weekly job in a low priority fair scheduler/capacity scheduler
queue. 

Maybe its just me, but I look at Coprocessors as a similar structure to
RDBMS triggers and stored procedures.
You need to restrain and use them sparingly otherwise you end up creating
performance issues.

Just IMHO.

-Mike

On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:

 I don't have any concern about the time it's taking. It's more about
 the load it's putting on the cluster. I have other jobs that I need to
 run (secondary index, data processing, etc.). So the more time this
 new job is taking, the less CPU the others will have.
 
 I tried the M/R and I really liked the way it's done. So my only
 concern will really be the performance of the delete part.
 
 That's why I'm wondering what's the best practice to move a row to
 another table.
 
 2012/10/17, Michael Segel michael_se...@hotmail.com:
 If you're going to be running this weekly, I would suggest that you
stick
 with the M/R job.
 
 Is there any reason why you need to be worried about the time it takes
to do
 the deletes?
 
 
 On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org
 wrote:
 
 Hi Mike,
 
 I'm expecting to run the job weekly. I initially thought about using
 end points because I found HBASE-6942 which was a good example for my
 needs.
 
 I'm fine with the Put part for the Map/Reduce, but I'm not sure about
 the delete. That's why I look at coprocessors. Then I figure that I
 also can do the Put on the coprocessor side.
 
 On a M/R, can I delete the row I'm dealing with based on some criteria
 like timestamp? If I do that, I will not do bulk deletes, but I will
 delete the rows one by one, right? Which might be very slow.
 
 If in the future I want to run the job daily, might that be an issue?
 
 Or should I go with the initial idea of doing the Put with the M/R job
 and the delete with HBASE-6942?
 
 Thanks,
 
 JM
 
 
 2012/10/17, Michael Segel michael_se...@hotmail.com:
 Hi,
 
 I'm a firm believer in KISS (Keep It Simple, Stupid)
 
 The Map/Reduce (map job only) is the simplest and least prone to
 failure.
 
 Not sure why you would want to do this using coprocessors.
 
 How often are you running this job? It sounds like its going to be
 sporadic.
 
 -Mike
 
 On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org
 wrote:
 
 Hi,
 
 Can someone please help me to understand the pros and cons between
 those 2 options for the following usecase?
 
 I need to transfer all the rows between 2 timestamps to another
table.
 
 My first idea was to run a MapReduce to map the rows and store them
on
 another table, and then delete them using an end point coprocessor.
 But the more I look into it, the more I think the MapReduce is not a
 good idea and I should use a coprocessor instead.
 
 BUT... The MapReduce framework guarantee me that it will run against
 all the regions. I tried to stop a regionserver while the job was
 running. The region moved, and the MapReduce restarted the job from
 the new location. Will the coprocessor do the same thing?
 
 Also, I found the webconsole for the MapReduce with the number of
 jobs, the status, etc. Is there the same thing with the
coprocessors?
 
 Are all coprocessors running at the same time on all regions, which
 mean we can have 100 of them running on a regionserver at a time? Or
 are they running like the MapReduce jobs based on some configured
 values?
 
 Thanks,
 
 JM
 
 
 
 
 
 
 






Re: one RegionServer crashed and the whole cluster was blocked

2012-10-18 Thread Nicolas Liochon
Hi,

Some stuff below:

On Thu, Oct 18, 2012 at 1:30 PM, 张磊 zhang...@youku.com wrote:

 Hi, All

   One of the RegionServer of our company’s cluster was crashed. At this
 time, I found:

 1.   All the RegionServer stopped handling the requests from the client
 side( requestsPerSecond=0 at the master-status UI page).

 2.   It takes about 12-15 minutes to recovery.

 3.   I have set hbase.regionserver.restart.on.zk.expire to true, but it
 does not work.

   For 1, I knew the cluster began to split log and recover the data on the
 crashed RegionServer, will the recovery operation block all the requests
 from the client side?


No. But it's worth checking that the region server who died was not the one
handling the .meta. region. If it's the case, it's could be an explanation
(clients do have a cache, but for first time access to a region they go to
the .meta. region first.)


   For 2, Is there any solution to reduce the recovery time?


12 minutes for a single region server crash (i.e. the datanode it still
there, the cluster is ok) seems huge.
You need to look at:
- a possible root cause: if the region server got disconnected, it may be
because the network or ZooKeeper was in the bad shape anyway. So the
recovery is slow because the cause of the crash is still there.
- how is your cluster? Do you have a a lot of regions to recover? Did you
have a lot of writes on this region server?


   For 3, I checked the log, found “session is timeout” exception, maybe
 for full gc and the session was timeout. But why the
 hbase.regionserver.restart.on.zk.expire does not work? My HBase version is
 0.94.0.


I'm not sure it's still in the code base. To be checked. As well, you can
have a root cause that makes the server stops.
But there are two sides of a ZK disconnect anyway:
1) the region server: if it's disconnected but actually still there so it
may decide to kill itself, or not.
2) the cluster: after the timeout, the timeouted regionserver is considered
as dead and the recovery starts. This whatever what happens in 1). So
whatever happens in 1) does not change much from a mttr point of view,
except if your cluster is small, or if your loosing multiple nodes.

There is an autorestart option in the 0.96 scripts. It changes nothing to
the mttr itself, but cover more cases of regionserver crashes. See releases
notes in HBASE-5939.

Good luck,

Nicolas


Re: ANN: HBase 0.94.2 is available for download

2012-10-18 Thread Amit Sela
+1 on pushing to maven repo.

Thanks.

On Wed, Oct 17, 2012 at 1:49 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Thanks Jean for your update.

 Regards
 Ram

  -Original Message-
  From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
  Sent: Wednesday, October 17, 2012 5:14 PM
  To: user@hbase.apache.org
  Subject: Re: ANN: HBase 0.94.2 is available for download
 
  Thanks. I tried to call some MR using the 0.94.2 jar on a 0.94.0
  cluster and it's working fine.
  To install it on the cluster I have done a full install on the nodes
  and re-started them one by one. Seems it worked fine.
 
  The only issue was with the master since I don't have a secondary
  master. I was on 0.94.0 so HBASE-6710 had no impact to me.
 
  2012/10/17, Stack st...@duboce.net:
   On Tue, Oct 16, 2012 at 7:00 AM, Jean-Marc Spaggiari
   jean-m...@spaggiari.org wrote:
   Hi St.Atck,
  
   Is the foll out upgrade process documented anywhere? I looked at the
   book by only find upgrade from 0.90 to 0.92. Can you point me to
   something? If there is no documentation yet, can someone draft the
   steps here so I can propose an update to the online book?
  
  
   Thanks Jean-Marc.
  
   You should be able to do a rolling restart from 0.92.x to 0.94.x.
  Its
   a bug if you can't.  There is no entry in the reference guide but
   there should be if only to say this... You might want to also call
  out
   https://issues.apache.org/jira/browse/HBASE-6710.  Folks should be
   conscious of its implications upgrading.
  
   Thanks boss,
   St.Ack
  




remote connection using HBase Java client

2012-10-18 Thread Erman Pattuk

Hi,

I have a standalone HBase 0.94.1 server running on my desktop. In the 
hbase-site.xml file, I just set hbase.rootdir.

From my laptop, i want to connect to HBase server in my desktop.

What should I change on my client and server HBase configuration files? 
Also, what should I change on /etc/hosts file for client and server?


thank you,

Erman


Re: ANN: HBase 0.94.2 is available for download

2012-10-18 Thread lars hofhansl
I'm on it. :)



- Original Message -
From: Amit Sela am...@infolinks.com
To: user@hbase.apache.org
Cc: 
Sent: Thursday, October 18, 2012 8:25 AM
Subject: Re: ANN: HBase 0.94.2 is available for download

+1 on pushing to maven repo.

Thanks.

On Wed, Oct 17, 2012 at 1:49 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Thanks Jean for your update.

 Regards
 Ram

  -Original Message-
  From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
  Sent: Wednesday, October 17, 2012 5:14 PM
  To: user@hbase.apache.org
  Subject: Re: ANN: HBase 0.94.2 is available for download
 
  Thanks. I tried to call some MR using the 0.94.2 jar on a 0.94.0
  cluster and it's working fine.
  To install it on the cluster I have done a full install on the nodes
  and re-started them one by one. Seems it worked fine.
 
  The only issue was with the master since I don't have a secondary
  master. I was on 0.94.0 so HBASE-6710 had no impact to me.
 
  2012/10/17, Stack st...@duboce.net:
   On Tue, Oct 16, 2012 at 7:00 AM, Jean-Marc Spaggiari
   jean-m...@spaggiari.org wrote:
   Hi St.Atck,
  
   Is the foll out upgrade process documented anywhere? I looked at the
   book by only find upgrade from 0.90 to 0.92. Can you point me to
   something? If there is no documentation yet, can someone draft the
   steps here so I can propose an update to the online book?
  
  
   Thanks Jean-Marc.
  
   You should be able to do a rolling restart from 0.92.x to 0.94.x.
  Its
   a bug if you can't.  There is no entry in the reference guide but
   there should be if only to say this... You might want to also call
  out
   https://issues.apache.org/jira/browse/HBASE-6710.  Folks should be
   conscious of its implications upgrading.
  
   Thanks boss,
   St.Ack
  





High IPC Latency

2012-10-18 Thread Yousuf Ahmad
Hello,

We are seeing slow times for read operations in our experiments. We are
hoping that you guys can help us figure out what's going wrong.

Here are some details:

   - We are running a read-only benchmark on our HBase cluster.
   -
   - There are 10 regionservers, each co-located with a datanode. HDFS
   replication is 3x.
   - All the data read by the experiment is already in the block cache and
   the hit ratio is 99%.
   -
   - We have 10 clients, each with around 400 threads making a mix of
   read-only requests involving multi-gets and scans.
   -
   - We settled on the default client pool type/size (roundrobin/1) and a
   regionserver handler count of 100 after testing various combinations to see
   what setting worked best.
   -
   - Our scans are short, fetching around 10 rows on average. Scanner
   caching is set to 50.
   - An average row in a scan has either around 10 columns (small row) or
   around 200 columns (big row).
   -
   - Our multi-gets fetch around 200 rows on average.
   - An average row in a multi-get has around 10 columns.
   - Each column holds an integer (encoded into bytes).
   -
   - None of the machines involved reach CPU, memory, or IO saturation. In
   fact resource utilization stays quite low.
   -
   - Our statistics show that the average time for a scan, measured
   starting from the first scanner.next() call to the last one which returns a
   null, is around 2-3 seconds.
   - Since we use scanner caching, the major portion of this time (around 2
   seconds) is spent on the first call to next(), while the remaining calls
   take a negligible amount of time.
   - Similarly, we see that a multi-get on average takes around 2 seconds.
   - A single get on average takes around 1 second.

We are not sure what the bottleneck is or where it lies. We thought we
should look deeper into what is going on at the regionservers. We monitored
the IPC calls during one of the experiments. Here is a sample of one
regionserver log:

2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115483; Served: HRegionInterface#get queueTime=0 processingTime=1
contents=1 Get, 75 bytes
2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115487; Served: HRegionInterface#get queueTime=0 processingTime=0
contents=1 Get, 75 bytes
2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115489; Served: HRegionInterface#get queueTime=0 processingTime=0
contents=1 Get, 75 bytes
2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#111421; Served: HRegionInterface#get queueTime=0 processingTime=0
contents=1 Get, 75 bytes
2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115497; Served: HRegionInterface#multi queueTime=0 processingTime=9
contents=200 Gets
2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115499; Served: HRegionInterface#openScanner queueTime=0 processingTime=0
contents=1 Scan, 63 bytes
2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#115503; Served: HRegionInterface#get queueTime=0 processingTime=0
contents=1 Get, 75 bytes
2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#103230; Served: HRegionInterface#next queueTime=0 processingTime=0
contents=1 Long, 1 Integer
2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#103234; Served: HRegionInterface#close queueTime=0 processingTime=0
contents=1 Long
2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#103232; Served: HRegionInterface#next queueTime=0 processingTime=0
contents=1 Long, 1 Integer

I have attached a larger chunk of the logs we collected for this experiment
in case that helps.

From the logs, we saw that the next() operation at the regionserver takes 1
millisecond or less; and a multi-get takes 10 ms on average.
Yet the corresponding times we see at the client are orders of magnitude
higher.
Ping times between the machines are at most 1ms and we are not saturating
the network.

We would really appreciate some insights from you guys on this.
Where do you suggest we focus our efforts in order to hunt down this
bottleneck/contention?

Thanks!
Yousuf


Re: High IPC Latency

2012-10-18 Thread lars hofhansl
Also, what version of HBase/HDFS is this using?




- Original Message -
From: Pamecha, Abhishek apame...@x.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt
Sent: Thursday, October 18, 2012 11:38 AM
Subject: RE: High IPC Latency

Is it sustained for the same client hitting the same region server OR does it 
get better for the same client-RS combination when run for longer duration?  
Trying to eliminate Zookeeper from this.

Thanks,
Abhishek

From: Yousuf Ahmad [mailto:myahm...@gmail.com]
Sent: Thursday, October 18, 2012 11:26 AM
To: user@hbase.apache.org
Cc: Ivan Brondino; Ricardo Vilaça
Subject: High IPC Latency

Hello,

We are seeing slow times for read operations in our experiments. We are hoping 
that you guys can help us figure out what's going wrong.

Here are some details:

  *   We are running a read-only benchmark on our HBase cluster.
  *
  *   There are 10 regionservers, each co-located with a datanode. HDFS 
replication is 3x.
  *   All the data read by the experiment is already in the block cache and the 
hit ratio is 99%.
  *
  *   We have 10 clients, each with around 400 threads making a mix of 
read-only requests involving multi-gets and scans.
  *
  *   We settled on the default client pool type/size (roundrobin/1) and a 
regionserver handler count of 100 after testing various combinations to see 
what setting worked best.
  *
  *   Our scans are short, fetching around 10 rows on average. Scanner caching 
is set to 50.
  *   An average row in a scan has either around 10 columns (small row) or 
around 200 columns (big row).
  *
  *   Our multi-gets fetch around 200 rows on average.
  *   An average row in a multi-get has around 10 columns.
  *   Each column holds an integer (encoded into bytes).
  *
  *   None of the machines involved reach CPU, memory, or IO saturation. In 
fact resource utilization stays quite low.
  *
  *   Our statistics show that the average time for a scan, measured starting 
from the first scanner.next() call to the last one which returns a null, is 
around 2-3 seconds.
  *   Since we use scanner caching, the major portion of this time (around 2 
seconds) is spent on the first call to next(), while the remaining calls take a 
negligible amount of time.
  *   Similarly, we see that a multi-get on average takes around 2 seconds.
  *   A single get on average takes around 1 second.
We are not sure what the bottleneck is or where it lies. We thought we should 
look deeper into what is going on at the regionservers. We monitored the IPC 
calls during one of the experiments. Here is a sample of one regionserver log:

2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 
Get, 75 bytes
2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 
Get, 75 bytes
2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 
Get, 75 bytes
2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 
Get, 75 bytes
2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 
contents=200 Gets
2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115499; Served: HRegionInterface#openScanner queueTime=0 processingTime=0 
contents=1 Scan, 63 bytes
2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#115503; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 
Get, 75 bytes
2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#103230; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 
Long, 1 Integer
2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#103234; Served: HRegionInterface#close queueTime=0 processingTime=0 contents=1 
Long
2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call 
#103232; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 
Long, 1 Integer

I have attached a larger chunk of the logs we collected for this experiment in 
case that helps.

From the logs, we saw that the next() operation at the regionserver takes 1 
millisecond or less; and a multi-get takes 10 ms on average.
Yet the corresponding times we see at the client are orders of magnitude higher.
Ping times between the machines are at most 1ms and we are not saturating the 
network.

We would really appreciate some insights from you guys on this.
Where do you suggest we focus our efforts in order to hunt down this 
bottleneck/contention?

Thanks!
Yousuf


Re: Coprocessor end point vs MapReduce?

2012-10-18 Thread Doug Meil

I agree with the concern and there isn't a ton of guidance on this area
yet. 



On 10/18/12 2:01 PM, Michael Segel michael_se...@hotmail.com wrote:

Doug, 

One thing that concerns me is that a lot of folks are gravitating to
Coprocessors and may be using them for the wrong thing.
Has anyone done any sort of research as to some of the limitations and
negative impacts on using coprocessors?

While I haven't really toyed with the idea of bulk deletes, periodic
deletes is probably not a good use of coprocessors however using them
to synchronize tables would be a valid use case.

Thx

-Mike

On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com
wrote:

 
 To echo what Mike said about KISS, would you use triggers for a large
 time-sensitive batch job in an RDBMS?  It's possible, but probably not.
 Then you might want to think twice about using co-processors for such a
 purpose with HBase.
 
 
 
 
 
 On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote:
 
 Run your weekly job in a low priority fair scheduler/capacity scheduler
 queue. 
 
 Maybe its just me, but I look at Coprocessors as a similar structure to
 RDBMS triggers and stored procedures.
 You need to restrain and use them sparingly otherwise you end up
creating
 performance issues.
 
 Just IMHO.
 
 -Mike
 
 On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:
 
 I don't have any concern about the time it's taking. It's more about
 the load it's putting on the cluster. I have other jobs that I need to
 run (secondary index, data processing, etc.). So the more time this
 new job is taking, the less CPU the others will have.
 
 I tried the M/R and I really liked the way it's done. So my only
 concern will really be the performance of the delete part.
 
 That's why I'm wondering what's the best practice to move a row to
 another table.
 
 2012/10/17, Michael Segel michael_se...@hotmail.com:
 If you're going to be running this weekly, I would suggest that you
 stick
 with the M/R job.
 
 Is there any reason why you need to be worried about the time it
takes
 to do
 the deletes?
 
 
 On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org
 wrote:
 
 Hi Mike,
 
 I'm expecting to run the job weekly. I initially thought about using
 end points because I found HBASE-6942 which was a good example for
my
 needs.
 
 I'm fine with the Put part for the Map/Reduce, but I'm not sure
about
 the delete. That's why I look at coprocessors. Then I figure that I
 also can do the Put on the coprocessor side.
 
 On a M/R, can I delete the row I'm dealing with based on some
criteria
 like timestamp? If I do that, I will not do bulk deletes, but I will
 delete the rows one by one, right? Which might be very slow.
 
 If in the future I want to run the job daily, might that be an
issue?
 
 Or should I go with the initial idea of doing the Put with the M/R
job
 and the delete with HBASE-6942?
 
 Thanks,
 
 JM
 
 
 2012/10/17, Michael Segel michael_se...@hotmail.com:
 Hi,
 
 I'm a firm believer in KISS (Keep It Simple, Stupid)
 
 The Map/Reduce (map job only) is the simplest and least prone to
 failure.
 
 Not sure why you would want to do this using coprocessors.
 
 How often are you running this job? It sounds like its going to be
 sporadic.
 
 -Mike
 
 On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org
 wrote:
 
 Hi,
 
 Can someone please help me to understand the pros and cons between
 those 2 options for the following usecase?
 
 I need to transfer all the rows between 2 timestamps to another
 table.
 
 My first idea was to run a MapReduce to map the rows and store
them
 on
 another table, and then delete them using an end point
coprocessor.
 But the more I look into it, the more I think the MapReduce is
not a
 good idea and I should use a coprocessor instead.
 
 BUT... The MapReduce framework guarantee me that it will run
against
 all the regions. I tried to stop a regionserver while the job was
 running. The region moved, and the MapReduce restarted the job
from
 the new location. Will the coprocessor do the same thing?
 
 Also, I found the webconsole for the MapReduce with the number of
 jobs, the status, etc. Is there the same thing with the
 coprocessors?
 
 Are all coprocessors running at the same time on all regions,
which
 mean we can have 100 of them running on a regionserver at a time?
Or
 are they running like the MapReduce jobs based on some configured
 values?
 
 Thanks,
 
 JM
 
 
 
 
 
 
 
 
 
 
 
 






Re: High IPC Latency

2012-10-18 Thread Yousuf Ahmad
Hi,

Thank you for your questions guys.

We are using HBase 0.92 with HDFS 1.0.1.

The experiment lasts 15 minutes. The measurements stabilize in the first
two minutes of the run.

The data is distributed almost evenly across the regionservers so each
client hits most of them over the course of the experiment. However, for
the data we have, any given multi-get or scan should touch only one or at
most two regions.

The client caches the locations of the regionservers, so after a couple of
minutes of the experiment running, it wouldn't need to re-visit ZooKeeper,
I believe. Correct me if I am wrong please.

Regards,
Yousuf


On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com wrote:

 Also, what version of HBase/HDFS is this using?




 - Original Message -
 From: Pamecha, Abhishek apame...@x.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça 
 rmvil...@di.uminho.pt
 Sent: Thursday, October 18, 2012 11:38 AM
 Subject: RE: High IPC Latency

 Is it sustained for the same client hitting the same region server OR does
 it get better for the same client-RS combination when run for longer
 duration?  Trying to eliminate Zookeeper from this.

 Thanks,
 Abhishek

 From: Yousuf Ahmad [mailto:myahm...@gmail.com]
 Sent: Thursday, October 18, 2012 11:26 AM
 To: user@hbase.apache.org
 Cc: Ivan Brondino; Ricardo Vilaça
 Subject: High IPC Latency

 Hello,

 We are seeing slow times for read operations in our experiments. We are
 hoping that you guys can help us figure out what's going wrong.

 Here are some details:

   *   We are running a read-only benchmark on our HBase cluster.
   *
   *   There are 10 regionservers, each co-located with a datanode. HDFS
 replication is 3x.
   *   All the data read by the experiment is already in the block cache
 and the hit ratio is 99%.
   *
   *   We have 10 clients, each with around 400 threads making a mix of
 read-only requests involving multi-gets and scans.
   *
   *   We settled on the default client pool type/size (roundrobin/1) and a
 regionserver handler count of 100 after testing various combinations to see
 what setting worked best.
   *
   *   Our scans are short, fetching around 10 rows on average. Scanner
 caching is set to 50.
   *   An average row in a scan has either around 10 columns (small row) or
 around 200 columns (big row).
   *
   *   Our multi-gets fetch around 200 rows on average.
   *   An average row in a multi-get has around 10 columns.
   *   Each column holds an integer (encoded into bytes).
   *
   *   None of the machines involved reach CPU, memory, or IO saturation.
 In fact resource utilization stays quite low.
   *
   *   Our statistics show that the average time for a scan, measured
 starting from the first scanner.next() call to the last one which returns a
 null, is around 2-3 seconds.
   *   Since we use scanner caching, the major portion of this time (around
 2 seconds) is spent on the first call to next(), while the remaining calls
 take a negligible amount of time.
   *   Similarly, we see that a multi-get on average takes around 2 seconds.
   *   A single get on average takes around 1 second.
 We are not sure what the bottleneck is or where it lies. We thought we
 should look deeper into what is going on at the regionservers. We monitored
 the IPC calls during one of the experiments. Here is a sample of one
 regionserver log:

 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9
 contents=200 Gets
 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115499; Served: HRegionInterface#openScanner queueTime=0
 processingTime=0 contents=1 Scan, 63 bytes
 2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115503; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #103230; Served: HRegionInterface#next queueTime=0 processingTime=0
 contents=1 Long, 1 Integer
 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #103234; Served: HRegionInterface#close queueTime=0 processingTime=0
 contents=1 Long
 2012-10-18 17:00:09,994 

Re: Using filters in REST/stargate returns 204 (No content)

2012-10-18 Thread Andrew Purtell
What does the HBase shell return if you try that scan programatically?

On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh suresh.kum...@emc.comwrote:



 I have a HBase Java client which has a couple of filters and just work
 fine, I get the expected result.

 Here is the code:



 HTable table = new HTable(conf, apachelogs);


 Scan scan = new Scan();


 FilterList list = new
 FilterList(FilterList.Operator.MUST_PASS_ALL);


 RegexStringComparator comp = new
 RegexStringComparator(ERROR x.);



 SingleColumnValueFilter filter = new
 SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol),


 CompareOp.EQUAL, comp);

 filter.setFilterIfMissing(true);

 list.addFilter(filter);

 scan.setFilter(list);

 ResultScanner scanner = table.getScanner(scan);



 I startup the REST server, and use curl for the above functionality, I
 just base 64 encoded ERROR x.:



 curl -v -H Content-Type:text/xml -d @args.txt
 http://localhost:8080/apachelogs/scanner



 where args.txt is:



 Scanner

 filter

 {

 latestVersion:true, ifMissing:true,

 qualifier:pcol, family:mylog,

 op:EQUAL, type:SingleColumnValueFilter,


 comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty
 pe:RegexStringComparator}

 }

 /filter

 /Scanner



 which returns

 * About to connect() to localhost port 8080 (#0)

 *   Trying 127.0.0.1... connected

  POST /apachelogs/scanner HTTP/1.1

  User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3

  Host: localhost:8080

  Accept: */*

  Content-Type:text/xml

  Content-Length: 318

 

 * upload completely sent off: 318out of 318 bytes

  HTTP/1.1 201 Created

  Location:
 http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6

  Content-Length: 0

 

 * Connection #0 to host localhost left intact

 * Closing connection #0



 but  curl -v
 http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6

 returns HTTP/1.1 204 No Content



 Any clues?



 Thanks,

 Suresh




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: crafting your key - scan vs. get

2012-10-18 Thread Ian Varley
Hi Neil,

Mike summed it up well, as usual. :) Your choices of where to describe this 
dimension of your data (a one-to-many between users and events) are:

 - one row per event
 - one row per user, with events as columns
 - one row per user, with events as versions on a single cell

The first two are the best choices, since the third is sort of a perversion of 
the time dimension (it isn't one thing that's changing, it's many things over 
time), and might make things counter-intuitive when combined with deletes, 
compaction, etc. You can do it, but caveat emptor. :)

Since you have in the 100s or 1000s of events per user, it's reasonable to use 
the 2nd (columns). And with 1k cell sizes, even extreme cases (thousands of 
events) won't kill you.

That said, the main plus you get out of using columns over rows is ACID 
properties; you could get  set all the stuff for a single user atomically if 
it's columns in a single row, but not if its separate rows. That's nice, but 
I'm guessing you probably don't need to do that, and instead would write out 
the events as they happen (i.e., you would rarely be doing PUTs for multiple 
events for the same user at the same time, right?).

In theory, tall tables (the row-wise model) should have a slight performance 
advantage over wide tables (the column-wise model), all other things being 
equal; the shape of the data is nearly the same, but the row-wise version 
doesn't have to do any work preserving consistency. Your informal tests about 
GET vs SCAN perf seem a little suspect, since a GET is actually implemented as 
a one-row SCAN; but the devil's in the details, so if you see that happening 
repeatably with data that's otherwise identical, raise it up to the dev list 
and people should look at it.

The key thing is to try it for yourself and see. :)

Ian

ps - Sorry Mike was rude to you in his response. Your question was well-phrased 
and not at all boring. Mike, you can explain all you want, but saying Your 
question is boring is straight up rude; please don't do that.


From: Neil Yalowitz neilyalow...@gmail.commailto:neilyalow...@gmail.com
Date: Tue, Oct 16, 2012 at 2:53 PM
Subject: crafting your key - scan vs. get
To: user@hbase.apache.orgmailto:user@hbase.apache.org


Hopefully this is a fun question.  :)

Assume you could architect an HBase table from scratch and you were
choosing between the following two key structures.

1)

The first structure creates a unique row key for each PUT.  The rows are
events related to a user ID.  There may be up to several hundred events for
each user ID (probably not thousands, an average of perhaps ~100 events per
user).  Each key would be made unique with a reverse-order-timestamp or
perhaps just random characters (we don't particularly care about using ROT
for sorting newest here).

key

AA + some-unique-chars

The table will look like this:

key   vals  cf:mycfts
---
AA... myval1 1350345600
AA... myval2 1350259200
AA... myval3 1350172800


Retrieving these values will use a Scan with startRow and stopRow.  In
hbase shell, it would look like:

$ scan 'mytable',{STARTROW='AA', ENDROW='AA_'}


2)

The second structure choice uses only the user ID as the key and relies on
row versions to store all the events.  For example:

key   vals   cf:mycf ts
-
AAmyval1   1350345600
AAmyval2   1350259200
AAmyval3   1350172800

Retrieving these values will use a Get with VERSIONS = somebignumber.  In
hbase shell, it would look like:

$ get 'mytable','AA',{COLUMN='cf:mycf', VERSIONS=999}

...although this probably violates a comment in the HBase documentation:

It is not recommended setting the number of max versions to an exceedingly
high level (e.g., hundreds or more) unless those old values are very dear
to you because this will greatly increase StoreFile size.

...found here: http://hbase.apache.org/book/schema.versions.html


So, are there any performance considerations between Scan vs. Get in this
use case?  Which choice would you go for?



Neil Yalowitz
neilyalow...@gmail.commailto:neilyalow...@gmail.com



RE: Using filters in REST/stargate returns 204 (No content)

2012-10-18 Thread Kumar, Suresh

When I run the Java code, it returns the valid rows which match the
regex.
I base64encoded the qulaifier and family fields as well, still empty
result.

Suresh


-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org] 
Sent: Thursday, October 18, 2012 1:19 PM
To: user@hbase.apache.org
Subject: Re: Using filters in REST/stargate returns 204 (No content)

What does the HBase shell return if you try that scan programatically?

On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh
suresh.kum...@emc.comwrote:



 I have a HBase Java client which has a couple of filters and just work
 fine, I get the expected result.

 Here is the code:



 HTable table = new HTable(conf, apachelogs);


 Scan scan = new Scan();


 FilterList list = new
 FilterList(FilterList.Operator.MUST_PASS_ALL);


 RegexStringComparator comp = new
 RegexStringComparator(ERROR x.);



 SingleColumnValueFilter filter = new
 SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol),


 CompareOp.EQUAL, comp);

 filter.setFilterIfMissing(true);

 list.addFilter(filter);

 scan.setFilter(list);

 ResultScanner scanner = table.getScanner(scan);



 I startup the REST server, and use curl for the above functionality, I
 just base 64 encoded ERROR x.:



 curl -v -H Content-Type:text/xml -d @args.txt
 http://localhost:8080/apachelogs/scanner



 where args.txt is:



 Scanner

 filter

 {

 latestVersion:true, ifMissing:true,

 qualifier:pcol, family:mylog,

 op:EQUAL, type:SingleColumnValueFilter,



comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty
 pe:RegexStringComparator}

 }

 /filter

 /Scanner



 which returns

 * About to connect() to localhost port 8080 (#0)

 *   Trying 127.0.0.1... connected

  POST /apachelogs/scanner HTTP/1.1

  User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3

  Host: localhost:8080

  Accept: */*

  Content-Type:text/xml

  Content-Length: 318

 

 * upload completely sent off: 318out of 318 bytes

  HTTP/1.1 201 Created

  Location:
 http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6

  Content-Length: 0

 

 * Connection #0 to host localhost left intact

 * Closing connection #0



 but  curl -v
 http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6

 returns HTTP/1.1 204 No Content



 Any clues?



 Thanks,

 Suresh




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Unable to add co-processor to table through HBase api

2012-10-18 Thread anil gupta
Hi Folks,

Still, i am unable to add the co-processors through HBase client api. This
time i tried loading the coprocessor by providing the jar path along with
parameters. But, it failed.
I was able to add the same coprocessor to the table through HBase shell.
I also dont see any logs regarding adding coprocessors in regionservers
when i try to add the co-processor through api.I strongly feel that HBase
client api for adding coprocessor seems to be broken. Please let me know if
the code below seems to be problematic.

Here is the code i used to add the coprocessor through HBase api:
private static void modifyTable() throws IOException
{
Configuration conf = HBaseConfiguration.create();
HBaseAdmin hAdmin = new HBaseAdmin(conf);
String tableName = txn;
hAdmin.disableTable(tableName);
if(!hAdmin.isTableEnabled(tableName))
{
  System.out.println(Trying to add coproc to table); // using err so
that it's easy to read this on eclipse console.
  HashMapString, String map = new HashMapString,String();
  map.put(arg1, batchdate);
  String className =
com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;

hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className,
  new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
Coprocessor.PRIORITY_USER,map);

  if(
hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className)
  )
  {
System.err.println(YIPIE!!!);
  }
  hAdmin.enableTable(tableName);

}
hAdmin.close();
   }

Thanks,
Anil Gupta

On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Do let me know if you are stuck up.  May be I did not get your actual
 problem.

 All the best.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Wednesday, October 17, 2012 11:34 PM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Ram,
 
  The table exists and I don't get any error while running the program(i
  would get an error if the table did not exist). I am running a
  distributed
  cluster.
 
  Tried following additional ways also:
 
 1. I tried loading the AggregationImplementation coproc.
 2. I also tried adding the coprocs while the table is enabled.
 
 
  Also had a look at the JUnit test cases and could not find any
  difference.
 
  I am going to try adding the coproc along with jar in Hdfs and see what
  happens.
 
  Thanks,
  Anil Gupta
 
  On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   I tried out a sample test class.  It is working properly.  I just
  have a
   doubt whether you are doing the
   Htd.addCoprocessor() step before creating the table?  Try that way
  hope it
   should work.
  
   Regards
   Ram
  
-Original Message-
From: anil gupta [mailto:anilgupt...@gmail.com]
Sent: Wednesday, October 17, 2012 4:05 AM
To: user@hbase.apache.org
Subject: Unable to add co-processor to table through HBase api
   
Hi All,
   
I would like to add a RegionObserver to a HBase table through HBase
api. I
don't want to put this RegionObserver as a user or system co-
  processor
in
hbase-site.xml since this is specific to a table. So, option of
  using
hbase
properties is out. I have already copied the jar file in the
  classpath
of
region server and restarted the cluster.
   
Can any one point out the problem in following code for adding the
co-processor to the table:
private void modifyTable(String name) throws IOException
{
Configuration conf = HBaseConfiguration.create();
HBaseAdmin hAdmin = new HBaseAdmin(conf);
hAdmin.disableTable(txn_subset);
if(!hAdmin.isTableEnabled(txn_subset))
{
  System.err.println(Trying to add coproc to table); // using
  err
so
that it's easy to read this on eclipse console.
   
   
  hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).addCoprocessor(
com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver);
  if(
   
  hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).hasCoprocessor(
com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver)
)
  {
System.err.println(YIPIE!!!);
  }
  hAdmin.enableTable(ihub_txn_subset);
}
hAdmin.close();
}*
*
--
Thanks  Regards,
Anil Gupta
  
  
 
 
  --
  Thanks  Regards,
  Anil Gupta




-- 
Thanks  Regards,
Anil Gupta


Re: error when open hbase shell

2012-10-18 Thread Stack
On Wed, Oct 17, 2012 at 2:42 AM, hua xiang adam_...@yahoo.com wrote:
 Hi,
when open hbase shell with hdfs,there is an error.

   but root user can.
below is the error :
  [root@hadoop2 ~]# su - hdfs
 [hdfs@hadoop2 ~]$ id
 uid=494(hdfs) gid=502(hadoop) groups=502(hadoop)
 [hdfs@hadoop2 ~]$ hbase shell
 Error: Could not find or load main class org.jruby.Main
 [hdfs@hadoop2 ~]$


maybe profile problem?


It is in your CLASSPATH?  HBase is built?

St.Ack


Thrift Python client with regex

2012-10-18 Thread Kumar, Suresh
I am using Thrift (0.8.0) to get scan column values from a table.

This code returns all the values.

 

columns = ['mylog']

scanner = client.scannerOpen('apachelogs','', columns)

result = client.scannerGet(scanner)

while result:

  printRow(result[0])

  result = client.scannerGet(scanner)

  print Scanner finished

client.scannerClose(scanner)

 

The scannerOpen Python API says you can pass a regex in the column

qualifier, so if I send:

 

columns = ['mylog:suresh'], it should return all the values which has
the

string suresh right? I don't get any result.

 

Thanks,
Suresh



Re: Thrift Python client with regex

2012-10-18 Thread Norbert Burger
We had the same question earlier.  Unfortunately the documentation is
wrong on this account; scannerOpen resolves to either a call to
scan.addFamily or scan.addColumn, and neither directly supports regex
matching.

Regex pattern matching against colquals is definitely supported on the
Java side, so Thrift2 (0.94.0) is a possible solution, if you can
upgrade.  Another approach, depending on how large your rows are,
would be to grab the full list of cols, filter via regex on the client
side, and then specify explicitly in scannerOpen().

Norbert

On Thu, Oct 18, 2012 at 7:48 PM, Kumar, Suresh suresh.kum...@emc.com wrote:
 I am using Thrift (0.8.0) to get scan column values from a table.

 This code returns all the values.



 columns = ['mylog']

 scanner = client.scannerOpen('apachelogs','', columns)

 result = client.scannerGet(scanner)

 while result:

   printRow(result[0])

   result = client.scannerGet(scanner)

   print Scanner finished

 client.scannerClose(scanner)



 The scannerOpen Python API says you can pass a regex in the column

 qualifier, so if I send:



 columns = ['mylog:suresh'], it should return all the values which has
 the

 string suresh right? I don't get any result.



 Thanks,
 Suresh



Re: WAL.Hlog vs. Hlog

2012-10-18 Thread Stack
On Thu, Oct 18, 2012 at 7:35 PM, Maoke fib...@gmail.com wrote:
 hi Stack and all,

 i noticed that the regionserver.Hlog is obsoleted by regionserver.wal.Hlog,
 from version 0.20.6 to 0.90+. what is the major difference between the two,
 in principle? what we should pay attention to when using the WAL.Hlog?


Your best bet is reviewing the release notes for 0.90 and the issue
that moved WAL, HBASE-1756 Refactor HLog.  Going by the issue, the
motivation was cleanup.

St.Ack


Re: WAL.Hlog vs. Hlog

2012-10-18 Thread Maoke
2012/10/19 Stack st...@duboce.net

 On Thu, Oct 18, 2012 at 7:35 PM, Maoke fib...@gmail.com wrote:
  hi Stack and all,
 
  i noticed that the regionserver.Hlog is obsoleted by
 regionserver.wal.Hlog,
  from version 0.20.6 to 0.90+. what is the major difference between the
 two,
  in principle? what we should pay attention to when using the WAL.Hlog?
 

 Your best bet is reviewing the release notes for 0.90 and the issue
 that moved WAL, HBASE-1756 Refactor HLog.  Going by the issue, the
 motivation was cleanup.


thanks a lot! i will read that ASAP. - maoke



 St.Ack



答复: hbase.client.scanner.timeout.period not being respected

2012-10-18 Thread 谢良
Did you rebounce your server cluster ?
Per HregionServer.java code :
this.scannerLeaseTimeoutPeriod = 
conf.getInt(HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD,
  HConstants.DEFAULT_HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD);
seems this parameter is used by server side as well

I am not an expert on it, hope hopeful for you:)

Best,
Liang

发件人: Bai Shen [baishen.li...@gmail.com]
发送时间: 2012年10月18日 23:25
收件人: user@hbase.apache.org
主题: hbase.client.scanner.timeout.period not being respected

I've set hbase.client.scanner.timeout.period on my client to 30, but
I'm still getting errors showing that hbase is using the default value of
6.

Any ideas why this is?

Thanks.


RE: Coprocessor end point vs MapReduce?

2012-10-18 Thread Anoop Sam John
A CP and Endpoints operates at a region level.. Any operation within one region 
we can perform using this..  I have seen in below use case that along with the 
delete there was a need for inserting data to some other table also.. Also this 
was kind of a periodic action.. I really doubt how the endpoints alone can be 
used here.. I also tend towards the MR..

  The idea behind the bulk delete CP is simple.  We have a use case of deleting 
a bulk of rows and this need to be online delete. I also have seen in the 
mailing list many people ask question regarding that... In all people were 
using scans and get the rowkeys to the client side and then doing the deletes.. 
 Yes most of the time complaint was the slowness..  One bulk delete performance 
improvement was done in HBASE-6284..  Still thought we can do all the operation 
(scan+delete) in server side and we can make use of the endpoints here.. This 
will be much more faster and can be used for online bulk deletes..

-Anoop-


From: Michael Segel [michael_se...@hotmail.com]
Sent: Thursday, October 18, 2012 11:31 PM
To: user@hbase.apache.org
Subject: Re: Coprocessor end point vs MapReduce?

Doug,

One thing that concerns me is that a lot of folks are gravitating to 
Coprocessors and may be using them for the wrong thing.
Has anyone done any sort of research as to some of the limitations and negative 
impacts on using coprocessors?

While I haven't really toyed with the idea of bulk deletes, periodic deletes is 
probably not a good use of coprocessors however using them to synchronize 
tables would be a valid use case.

Thx

-Mike

On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com wrote:


 To echo what Mike said about KISS, would you use triggers for a large
 time-sensitive batch job in an RDBMS?  It's possible, but probably not.
 Then you might want to think twice about using co-processors for such a
 purpose with HBase.





 On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote:

 Run your weekly job in a low priority fair scheduler/capacity scheduler
 queue.

 Maybe its just me, but I look at Coprocessors as a similar structure to
 RDBMS triggers and stored procedures.
 You need to restrain and use them sparingly otherwise you end up creating
 performance issues.

 Just IMHO.

 -Mike

 On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:

 I don't have any concern about the time it's taking. It's more about
 the load it's putting on the cluster. I have other jobs that I need to
 run (secondary index, data processing, etc.). So the more time this
 new job is taking, the less CPU the others will have.

 I tried the M/R and I really liked the way it's done. So my only
 concern will really be the performance of the delete part.

 That's why I'm wondering what's the best practice to move a row to
 another table.

 2012/10/17, Michael Segel michael_se...@hotmail.com:
 If you're going to be running this weekly, I would suggest that you
 stick
 with the M/R job.

 Is there any reason why you need to be worried about the time it takes
 to do
 the deletes?


 On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org
 wrote:

 Hi Mike,

 I'm expecting to run the job weekly. I initially thought about using
 end points because I found HBASE-6942 which was a good example for my
 needs.

 I'm fine with the Put part for the Map/Reduce, but I'm not sure about
 the delete. That's why I look at coprocessors. Then I figure that I
 also can do the Put on the coprocessor side.

 On a M/R, can I delete the row I'm dealing with based on some criteria
 like timestamp? If I do that, I will not do bulk deletes, but I will
 delete the rows one by one, right? Which might be very slow.

 If in the future I want to run the job daily, might that be an issue?

 Or should I go with the initial idea of doing the Put with the M/R job
 and the delete with HBASE-6942?

 Thanks,

 JM


 2012/10/17, Michael Segel michael_se...@hotmail.com:
 Hi,

 I'm a firm believer in KISS (Keep It Simple, Stupid)

 The Map/Reduce (map job only) is the simplest and least prone to
 failure.

 Not sure why you would want to do this using coprocessors.

 How often are you running this job? It sounds like its going to be
 sporadic.

 -Mike

 On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org
 wrote:

 Hi,

 Can someone please help me to understand the pros and cons between
 those 2 options for the following usecase?

 I need to transfer all the rows between 2 timestamps to another
 table.

 My first idea was to run a MapReduce to map the rows and store them
 on
 another table, and then delete them using an end point coprocessor.
 But the more I look into it, the more I think the MapReduce is not a
 good idea and I should use a coprocessor instead.

 BUT... The MapReduce framework guarantee me that it will run against
 all the regions. I tried to stop a regionserver 

Re: Coprocessor end point vs MapReduce?

2012-10-18 Thread lohit
I might be little off here. If rows are moved to another table on weekly or
daily basis, why not create per weekly or per day table.
That way you need to copy and delete. Of course it will not work you are
are selectively filtering between timestamps and clients have to have
notion of multiple tables.

2012/10/18 Anoop Sam John anoo...@huawei.com

 A CP and Endpoints operates at a region level.. Any operation within one
 region we can perform using this..  I have seen in below use case that
 along with the delete there was a need for inserting data to some other
 table also.. Also this was kind of a periodic action.. I really doubt how
 the endpoints alone can be used here.. I also tend towards the MR..

   The idea behind the bulk delete CP is simple.  We have a use case of
 deleting a bulk of rows and this need to be online delete. I also have seen
 in the mailing list many people ask question regarding that... In all
 people were using scans and get the rowkeys to the client side and then
 doing the deletes..  Yes most of the time complaint was the slowness..  One
 bulk delete performance improvement was done in HBASE-6284..  Still thought
 we can do all the operation (scan+delete) in server side and we can make
 use of the endpoints here.. This will be much more faster and can be used
 for online bulk deletes..

 -Anoop-

 
 From: Michael Segel [michael_se...@hotmail.com]
 Sent: Thursday, October 18, 2012 11:31 PM
 To: user@hbase.apache.org
 Subject: Re: Coprocessor end point vs MapReduce?

 Doug,

 One thing that concerns me is that a lot of folks are gravitating to
 Coprocessors and may be using them for the wrong thing.
 Has anyone done any sort of research as to some of the limitations and
 negative impacts on using coprocessors?

 While I haven't really toyed with the idea of bulk deletes, periodic
 deletes is probably not a good use of coprocessors however using them
 to synchronize tables would be a valid use case.

 Thx

 -Mike

 On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com
 wrote:

 
  To echo what Mike said about KISS, would you use triggers for a large
  time-sensitive batch job in an RDBMS?  It's possible, but probably not.
  Then you might want to think twice about using co-processors for such a
  purpose with HBase.
 
 
 
 
 
  On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote:
 
  Run your weekly job in a low priority fair scheduler/capacity scheduler
  queue.
 
  Maybe its just me, but I look at Coprocessors as a similar structure to
  RDBMS triggers and stored procedures.
  You need to restrain and use them sparingly otherwise you end up
 creating
  performance issues.
 
  Just IMHO.
 
  -Mike
 
  On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org wrote:
 
  I don't have any concern about the time it's taking. It's more about
  the load it's putting on the cluster. I have other jobs that I need to
  run (secondary index, data processing, etc.). So the more time this
  new job is taking, the less CPU the others will have.
 
  I tried the M/R and I really liked the way it's done. So my only
  concern will really be the performance of the delete part.
 
  That's why I'm wondering what's the best practice to move a row to
  another table.
 
  2012/10/17, Michael Segel michael_se...@hotmail.com:
  If you're going to be running this weekly, I would suggest that you
  stick
  with the M/R job.
 
  Is there any reason why you need to be worried about the time it takes
  to do
  the deletes?
 
 
  On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org
  wrote:
 
  Hi Mike,
 
  I'm expecting to run the job weekly. I initially thought about using
  end points because I found HBASE-6942 which was a good example for my
  needs.
 
  I'm fine with the Put part for the Map/Reduce, but I'm not sure about
  the delete. That's why I look at coprocessors. Then I figure that I
  also can do the Put on the coprocessor side.
 
  On a M/R, can I delete the row I'm dealing with based on some
 criteria
  like timestamp? If I do that, I will not do bulk deletes, but I will
  delete the rows one by one, right? Which might be very slow.
 
  If in the future I want to run the job daily, might that be an issue?
 
  Or should I go with the initial idea of doing the Put with the M/R
 job
  and the delete with HBASE-6942?
 
  Thanks,
 
  JM
 
 
  2012/10/17, Michael Segel michael_se...@hotmail.com:
  Hi,
 
  I'm a firm believer in KISS (Keep It Simple, Stupid)
 
  The Map/Reduce (map job only) is the simplest and least prone to
  failure.
 
  Not sure why you would want to do this using coprocessors.
 
  How often are you running this job? It sounds like its going to be
  sporadic.
 
  -Mike
 
  On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org
  wrote:
 
  Hi,
 
  Can someone please help me to understand the pros and cons between
  those 2 options for the following 

RE: High IPC Latency

2012-10-18 Thread Ramkrishna.S.Vasudevan
Hi Yousuf

 The client caches the locations of the regionservers, so after a couple
 of
 minutes of the experiment running, it wouldn't need to re-visit
 ZooKeeper,
 I believe. Correct me if I am wrong please.
Yes you are right.

Regards
Ram

 -Original Message-
 From: Yousuf Ahmad [mailto:myahm...@gmail.com]
 Sent: Friday, October 19, 2012 1:30 AM
 To: user@hbase.apache.org; lars hofhansl
 Cc: Ivan Brondino; Ricardo Vilaça
 Subject: Re: High IPC Latency
 
 Hi,
 
 Thank you for your questions guys.
 
 We are using HBase 0.92 with HDFS 1.0.1.
 
 The experiment lasts 15 minutes. The measurements stabilize in the
 first
 two minutes of the run.
 
 The data is distributed almost evenly across the regionservers so each
 client hits most of them over the course of the experiment. However,
 for
 the data we have, any given multi-get or scan should touch only one or
 at
 most two regions.
 
 The client caches the locations of the regionservers, so after a couple
 of
 minutes of the experiment running, it wouldn't need to re-visit
 ZooKeeper,
 I believe. Correct me if I am wrong please.
 
 Regards,
 Yousuf
 
 
 On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com
 wrote:
 
  Also, what version of HBase/HDFS is this using?
 
 
 
 
  - Original Message -
  From: Pamecha, Abhishek apame...@x.com
  To: user@hbase.apache.org user@hbase.apache.org
  Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça 
  rmvil...@di.uminho.pt
  Sent: Thursday, October 18, 2012 11:38 AM
  Subject: RE: High IPC Latency
 
  Is it sustained for the same client hitting the same region server OR
 does
  it get better for the same client-RS combination when run for longer
  duration?  Trying to eliminate Zookeeper from this.
 
  Thanks,
  Abhishek
 
  From: Yousuf Ahmad [mailto:myahm...@gmail.com]
  Sent: Thursday, October 18, 2012 11:26 AM
  To: user@hbase.apache.org
  Cc: Ivan Brondino; Ricardo Vilaça
  Subject: High IPC Latency
 
  Hello,
 
  We are seeing slow times for read operations in our experiments. We
 are
  hoping that you guys can help us figure out what's going wrong.
 
  Here are some details:
 
*   We are running a read-only benchmark on our HBase cluster.
*
*   There are 10 regionservers, each co-located with a datanode.
 HDFS
  replication is 3x.
*   All the data read by the experiment is already in the block
 cache
  and the hit ratio is 99%.
*
*   We have 10 clients, each with around 400 threads making a mix
 of
  read-only requests involving multi-gets and scans.
*
*   We settled on the default client pool type/size (roundrobin/1)
 and a
  regionserver handler count of 100 after testing various combinations
 to see
  what setting worked best.
*
*   Our scans are short, fetching around 10 rows on average.
 Scanner
  caching is set to 50.
*   An average row in a scan has either around 10 columns (small
 row) or
  around 200 columns (big row).
*
*   Our multi-gets fetch around 200 rows on average.
*   An average row in a multi-get has around 10 columns.
*   Each column holds an integer (encoded into bytes).
*
*   None of the machines involved reach CPU, memory, or IO
 saturation.
  In fact resource utilization stays quite low.
*
*   Our statistics show that the average time for a scan, measured
  starting from the first scanner.next() call to the last one which
 returns a
  null, is around 2-3 seconds.
*   Since we use scanner caching, the major portion of this time
 (around
  2 seconds) is spent on the first call to next(), while the remaining
 calls
  take a negligible amount of time.
*   Similarly, we see that a multi-get on average takes around 2
 seconds.
*   A single get on average takes around 1 second.
  We are not sure what the bottleneck is or where it lies. We thought
 we
  should look deeper into what is going on at the regionservers. We
 monitored
  the IPC calls during one of the experiments. Here is a sample of one
  regionserver log:
 
  2012-10-18 17:00:09,969 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #115483; Served: HRegionInterface#get queueTime=0
 processingTime=1
  contents=1 Get, 75 bytes
  2012-10-18 17:00:09,969 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #115487; Served: HRegionInterface#get queueTime=0
 processingTime=0
  contents=1 Get, 75 bytes
  2012-10-18 17:00:09,969 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #115489; Served: HRegionInterface#get queueTime=0
 processingTime=0
  contents=1 Get, 75 bytes
  2012-10-18 17:00:09,982 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #111421; Served: HRegionInterface#get queueTime=0
 processingTime=0
  contents=1 Get, 75 bytes
  2012-10-18 17:00:09,982 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #115497; Served: HRegionInterface#multi queueTime=0
 processingTime=9
  contents=200 Gets
  2012-10-18 17:00:09,984 DEBUG
 org.apache.hadoop.ipc.HBaseServer.trace:
  Call #115499; Served: 

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Anoop Sam John

hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className,
  new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
Coprocessor.PRIORITY_USER,map);

Anil,

Don't you have to modify the table calling Admin API??  !  Not seeing that 
code here...

-Anoop-


From: anil gupta [anilgupt...@gmail.com]
Sent: Friday, October 19, 2012 2:46 AM
To: user@hbase.apache.org
Subject: Re: Unable to add co-processor to table through HBase api

Hi Folks,

Still, i am unable to add the co-processors through HBase client api. This
time i tried loading the coprocessor by providing the jar path along with
parameters. But, it failed.
I was able to add the same coprocessor to the table through HBase shell.
I also dont see any logs regarding adding coprocessors in regionservers
when i try to add the co-processor through api.I strongly feel that HBase
client api for adding coprocessor seems to be broken. Please let me know if
the code below seems to be problematic.

Here is the code i used to add the coprocessor through HBase api:
private static void modifyTable() throws IOException
{
Configuration conf = HBaseConfiguration.create();
HBaseAdmin hAdmin = new HBaseAdmin(conf);
String tableName = txn;
hAdmin.disableTable(tableName);
if(!hAdmin.isTableEnabled(tableName))
{
  System.out.println(Trying to add coproc to table); // using err so
that it's easy to read this on eclipse console.
  HashMapString, String map = new HashMapString,String();
  map.put(arg1, batchdate);
  String className =
com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;

hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className,
  new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
Coprocessor.PRIORITY_USER,map);

  if(
hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className)
  )
  {
System.err.println(YIPIE!!!);
  }
  hAdmin.enableTable(tableName);

}
hAdmin.close();
   }

Thanks,
Anil Gupta

On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Do let me know if you are stuck up.  May be I did not get your actual
 problem.

 All the best.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Wednesday, October 17, 2012 11:34 PM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Ram,
 
  The table exists and I don't get any error while running the program(i
  would get an error if the table did not exist). I am running a
  distributed
  cluster.
 
  Tried following additional ways also:
 
 1. I tried loading the AggregationImplementation coproc.
 2. I also tried adding the coprocs while the table is enabled.
 
 
  Also had a look at the JUnit test cases and could not find any
  difference.
 
  I am going to try adding the coproc along with jar in Hdfs and see what
  happens.
 
  Thanks,
  Anil Gupta
 
  On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   I tried out a sample test class.  It is working properly.  I just
  have a
   doubt whether you are doing the
   Htd.addCoprocessor() step before creating the table?  Try that way
  hope it
   should work.
  
   Regards
   Ram
  
-Original Message-
From: anil gupta [mailto:anilgupt...@gmail.com]
Sent: Wednesday, October 17, 2012 4:05 AM
To: user@hbase.apache.org
Subject: Unable to add co-processor to table through HBase api
   
Hi All,
   
I would like to add a RegionObserver to a HBase table through HBase
api. I
don't want to put this RegionObserver as a user or system co-
  processor
in
hbase-site.xml since this is specific to a table. So, option of
  using
hbase
properties is out. I have already copied the jar file in the
  classpath
of
region server and restarted the cluster.
   
Can any one point out the problem in following code for adding the
co-processor to the table:
private void modifyTable(String name) throws IOException
{
Configuration conf = HBaseConfiguration.create();
HBaseAdmin hAdmin = new HBaseAdmin(conf);
hAdmin.disableTable(txn_subset);
if(!hAdmin.isTableEnabled(txn_subset))
{
  System.err.println(Trying to add coproc to table); // using
  err
so
that it's easy to read this on eclipse console.
   
   
  hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).addCoprocessor(
com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver);
  if(
   
  hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).hasCoprocessor(
com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver)
)
  {
System.err.println(YIPIE!!!);
  }
  

Re: Unable to add co-processor to table through HBase api

2012-10-18 Thread anil gupta
Hi Anoop,

Sorry, i am unable to understand what you mean by have to modify the table
calling Admin API??. Am i missing some other calls in my code?

Thanks,
Anil Gupta

On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote:



 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className,
   new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
 Coprocessor.PRIORITY_USER,map);

 Anil,

 Don't you have to modify the table calling Admin API??  !  Not seeing
 that code here...

 -Anoop-

 
 From: anil gupta [anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 2:46 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase api

 Hi Folks,

 Still, i am unable to add the co-processors through HBase client api. This
 time i tried loading the coprocessor by providing the jar path along with
 parameters. But, it failed.
 I was able to add the same coprocessor to the table through HBase shell.
 I also dont see any logs regarding adding coprocessors in regionservers
 when i try to add the co-processor through api.I strongly feel that HBase
 client api for adding coprocessor seems to be broken. Please let me know if
 the code below seems to be problematic.

 Here is the code i used to add the coprocessor through HBase api:
 private static void modifyTable() throws IOException
 {
 Configuration conf = HBaseConfiguration.create();
 HBaseAdmin hAdmin = new HBaseAdmin(conf);
 String tableName = txn;
 hAdmin.disableTable(tableName);
 if(!hAdmin.isTableEnabled(tableName))
 {
   System.out.println(Trying to add coproc to table); // using err so
 that it's easy to read this on eclipse console.
   HashMapString, String map = new HashMapString,String();
   map.put(arg1, batchdate);
   String className =
 com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;


 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className,
   new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
 Coprocessor.PRIORITY_USER,map);

   if(

 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className)
   )
   {
 System.err.println(YIPIE!!!);
   }
   hAdmin.enableTable(tableName);

 }
 hAdmin.close();
}

 Thanks,
 Anil Gupta

 On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:

  Do let me know if you are stuck up.  May be I did not get your actual
  problem.
 
  All the best.
 
  Regards
  Ram
 
   -Original Message-
   From: anil gupta [mailto:anilgupt...@gmail.com]
   Sent: Wednesday, October 17, 2012 11:34 PM
   To: user@hbase.apache.org
   Subject: Re: Unable to add co-processor to table through HBase api
  
   Hi Ram,
  
   The table exists and I don't get any error while running the program(i
   would get an error if the table did not exist). I am running a
   distributed
   cluster.
  
   Tried following additional ways also:
  
  1. I tried loading the AggregationImplementation coproc.
  2. I also tried adding the coprocs while the table is enabled.
  
  
   Also had a look at the JUnit test cases and could not find any
   difference.
  
   I am going to try adding the coproc along with jar in Hdfs and see what
   happens.
  
   Thanks,
   Anil Gupta
  
   On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
I tried out a sample test class.  It is working properly.  I just
   have a
doubt whether you are doing the
Htd.addCoprocessor() step before creating the table?  Try that way
   hope it
should work.
   
Regards
Ram
   
 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Wednesday, October 17, 2012 4:05 AM
 To: user@hbase.apache.org
 Subject: Unable to add co-processor to table through HBase api

 Hi All,

 I would like to add a RegionObserver to a HBase table through HBase
 api. I
 don't want to put this RegionObserver as a user or system co-
   processor
 in
 hbase-site.xml since this is specific to a table. So, option of
   using
 hbase
 properties is out. I have already copied the jar file in the
   classpath
 of
 region server and restarted the cluster.

 Can any one point out the problem in following code for adding the
 co-processor to the table:
 private void modifyTable(String name) throws IOException
 {
 Configuration conf = HBaseConfiguration.create();
 HBaseAdmin hAdmin = new HBaseAdmin(conf);
 hAdmin.disableTable(txn_subset);
 if(!hAdmin.isTableEnabled(txn_subset))
 {
   System.err.println(Trying to add coproc to table); // using
   err
 so
 that it's easy to read this on eclipse console.


   

Re: High IPC Latency

2012-10-18 Thread lars hofhansl
Can you reproduce this against a single, local region server?
Any chance that you can try with the just released 0.94.2?


I would love to debug this. If would be a tremendous help if you had a little 
test program that reproduces this against a single server, so that I can see 
what is going on.

Thanks.

-- Lars



- Original Message -
From: Yousuf Ahmad myahm...@gmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt
Sent: Thursday, October 18, 2012 12:59 PM
Subject: Re: High IPC Latency

Hi,

Thank you for your questions guys.

We are using HBase 0.92 with HDFS 1.0.1.

The experiment lasts 15 minutes. The measurements stabilize in the first
two minutes of the run.

The data is distributed almost evenly across the regionservers so each
client hits most of them over the course of the experiment. However, for
the data we have, any given multi-get or scan should touch only one or at
most two regions.

The client caches the locations of the regionservers, so after a couple of
minutes of the experiment running, it wouldn't need to re-visit ZooKeeper,
I believe. Correct me if I am wrong please.

Regards,
Yousuf


On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com wrote:

 Also, what version of HBase/HDFS is this using?




 - Original Message -
 From: Pamecha, Abhishek apame...@x.com
 To: user@hbase.apache.org user@hbase.apache.org
 Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça 
 rmvil...@di.uminho.pt
 Sent: Thursday, October 18, 2012 11:38 AM
 Subject: RE: High IPC Latency

 Is it sustained for the same client hitting the same region server OR does
 it get better for the same client-RS combination when run for longer
 duration?  Trying to eliminate Zookeeper from this.

 Thanks,
 Abhishek

 From: Yousuf Ahmad [mailto:myahm...@gmail.com]
 Sent: Thursday, October 18, 2012 11:26 AM
 To: user@hbase.apache.org
 Cc: Ivan Brondino; Ricardo Vilaça
 Subject: High IPC Latency

 Hello,

 We are seeing slow times for read operations in our experiments. We are
 hoping that you guys can help us figure out what's going wrong.

 Here are some details:

   *   We are running a read-only benchmark on our HBase cluster.
   *
   *   There are 10 regionservers, each co-located with a datanode. HDFS
 replication is 3x.
   *   All the data read by the experiment is already in the block cache
 and the hit ratio is 99%.
   *
   *   We have 10 clients, each with around 400 threads making a mix of
 read-only requests involving multi-gets and scans.
   *
   *   We settled on the default client pool type/size (roundrobin/1) and a
 regionserver handler count of 100 after testing various combinations to see
 what setting worked best.
   *
   *   Our scans are short, fetching around 10 rows on average. Scanner
 caching is set to 50.
   *   An average row in a scan has either around 10 columns (small row) or
 around 200 columns (big row).
   *
   *   Our multi-gets fetch around 200 rows on average.
   *   An average row in a multi-get has around 10 columns.
   *   Each column holds an integer (encoded into bytes).
   *
   *   None of the machines involved reach CPU, memory, or IO saturation.
 In fact resource utilization stays quite low.
   *
   *   Our statistics show that the average time for a scan, measured
 starting from the first scanner.next() call to the last one which returns a
 null, is around 2-3 seconds.
   *   Since we use scanner caching, the major portion of this time (around
 2 seconds) is spent on the first call to next(), while the remaining calls
 take a negligible amount of time.
   *   Similarly, we see that a multi-get on average takes around 2 seconds.
   *   A single get on average takes around 1 second.
 We are not sure what the bottleneck is or where it lies. We thought we
 should look deeper into what is going on at the regionservers. We monitored
 the IPC calls during one of the experiments. Here is a sample of one
 regionserver log:

 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0
 contents=1 Get, 75 bytes
 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9
 contents=200 Gets
 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace:
 Call #115499; Served: 

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Ramkrishna.S.Vasudevan
I can attach the code that I tried.  Here as the HTD is getting modified we
may need to call modifyTable().
My testclass did try this while doing creation of table itself.

I will attach shortly.

Regards
Ram

 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 10:29 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase api
 
 Hi Anoop,
 
 Sorry, i am unable to understand what you mean by have to modify the
 table
 calling Admin API??. Am i missing some other calls in my code?
 
 Thanks,
 Anil Gupta
 
 On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com
 wrote:
 
 
 
 
 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
 ssName,
new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
  Coprocessor.PRIORITY_USER,map);
 
  Anil,
 
  Don't you have to modify the table calling Admin API??  !  Not
 seeing
  that code here...
 
  -Anoop-
 
  
  From: anil gupta [anilgupt...@gmail.com]
  Sent: Friday, October 19, 2012 2:46 AM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Folks,
 
  Still, i am unable to add the co-processors through HBase client api.
 This
  time i tried loading the coprocessor by providing the jar path along
 with
  parameters. But, it failed.
  I was able to add the same coprocessor to the table through HBase
 shell.
  I also dont see any logs regarding adding coprocessors in
 regionservers
  when i try to add the co-processor through api.I strongly feel that
 HBase
  client api for adding coprocessor seems to be broken. Please let me
 know if
  the code below seems to be problematic.
 
  Here is the code i used to add the coprocessor through HBase api:
  private static void modifyTable() throws IOException
  {
  Configuration conf = HBaseConfiguration.create();
  HBaseAdmin hAdmin = new HBaseAdmin(conf);
  String tableName = txn;
  hAdmin.disableTable(tableName);
  if(!hAdmin.isTableEnabled(tableName))
  {
System.out.println(Trying to add coproc to table); // using
 err so
  that it's easy to read this on eclipse console.
HashMapString, String map = new HashMapString,String();
map.put(arg1, batchdate);
String className =
 
 com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;
 
 
 
 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas
 sName,
new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
  Coprocessor.PRIORITY_USER,map);
 
if(
 
 
 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas
 sName)
)
{
  System.err.println(YIPIE!!!);
}
hAdmin.enableTable(tableName);
 
  }
  hAdmin.close();
 }
 
  Thanks,
  Anil Gupta
 
  On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   Do let me know if you are stuck up.  May be I did not get your
 actual
   problem.
  
   All the best.
  
   Regards
   Ram
  
-Original Message-
From: anil gupta [mailto:anilgupt...@gmail.com]
Sent: Wednesday, October 17, 2012 11:34 PM
To: user@hbase.apache.org
Subject: Re: Unable to add co-processor to table through HBase
 api
   
Hi Ram,
   
The table exists and I don't get any error while running the
 program(i
would get an error if the table did not exist). I am running a
distributed
cluster.
   
Tried following additional ways also:
   
   1. I tried loading the AggregationImplementation coproc.
   2. I also tried adding the coprocs while the table is enabled.
   
   
Also had a look at the JUnit test cases and could not find any
difference.
   
I am going to try adding the coproc along with jar in Hdfs and
 see what
happens.
   
Thanks,
Anil Gupta
   
On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:
   
 I tried out a sample test class.  It is working properly.  I
 just
have a
 doubt whether you are doing the
 Htd.addCoprocessor() step before creating the table?  Try that
 way
hope it
 should work.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Wednesday, October 17, 2012 4:05 AM
  To: user@hbase.apache.org
  Subject: Unable to add co-processor to table through HBase
 api
 
  Hi All,
 
  I would like to add a RegionObserver to a HBase table through
 HBase
  api. I
  don't want to put this RegionObserver as a user or system co-
processor
  in
  hbase-site.xml since this is specific to a table. So, option
 of
using
  hbase
  properties is out. I have already copied the jar file in the
classpath
  of
  region 

Re: Unable to add co-processor to table through HBase api

2012-10-18 Thread anil gupta
Hi Guys,

Do you mean to say that i need to call the following method after the call
to addCoprocessor method:

public void *modifyTable*(byte[] tableName,
HTableDescriptor
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html
htd)
 throws IOException
http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?is-external=true


http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescriptor%29

Thanks,
Anil Gupta

On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 I can attach the code that I tried.  Here as the HTD is getting modified we
 may need to call modifyTable().
 My testclass did try this while doing creation of table itself.

 I will attach shortly.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Friday, October 19, 2012 10:29 AM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Anoop,
 
  Sorry, i am unable to understand what you mean by have to modify the
  table
  calling Admin API??. Am i missing some other calls in my code?
 
  Thanks,
  Anil Gupta
 
  On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com
  wrote:
 
  
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
  ssName,
 new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
   Coprocessor.PRIORITY_USER,map);
  
   Anil,
  
   Don't you have to modify the table calling Admin API??  !  Not
  seeing
   that code here...
  
   -Anoop-
  
   
   From: anil gupta [anilgupt...@gmail.com]
   Sent: Friday, October 19, 2012 2:46 AM
   To: user@hbase.apache.org
   Subject: Re: Unable to add co-processor to table through HBase api
  
   Hi Folks,
  
   Still, i am unable to add the co-processors through HBase client api.
  This
   time i tried loading the coprocessor by providing the jar path along
  with
   parameters. But, it failed.
   I was able to add the same coprocessor to the table through HBase
  shell.
   I also dont see any logs regarding adding coprocessors in
  regionservers
   when i try to add the co-processor through api.I strongly feel that
  HBase
   client api for adding coprocessor seems to be broken. Please let me
  know if
   the code below seems to be problematic.
  
   Here is the code i used to add the coprocessor through HBase api:
   private static void modifyTable() throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin hAdmin = new HBaseAdmin(conf);
   String tableName = txn;
   hAdmin.disableTable(tableName);
   if(!hAdmin.isTableEnabled(tableName))
   {
 System.out.println(Trying to add coproc to table); // using
  err so
   that it's easy to read this on eclipse console.
 HashMapString, String map = new HashMapString,String();
 map.put(arg1, batchdate);
 String className =
  
  com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;
  
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas
  sName,
 new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
   Coprocessor.PRIORITY_USER,map);
  
 if(
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas
  sName)
 )
 {
   System.err.println(YIPIE!!!);
 }
 hAdmin.enableTable(tableName);
  
   }
   hAdmin.close();
  }
  
   Thanks,
   Anil Gupta
  
   On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
Do let me know if you are stuck up.  May be I did not get your
  actual
problem.
   
All the best.
   
Regards
Ram
   
 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Wednesday, October 17, 2012 11:34 PM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase
  api

 Hi Ram,

 The table exists and I don't get any error while running the
  program(i
 would get an error if the table did not exist). I am running a
 distributed
 cluster.

 Tried following additional ways also:

1. I tried loading the AggregationImplementation coproc.
2. I also tried adding the coprocs while the table is enabled.


 Also had a look at the JUnit test cases and could not find any
 difference.

 I am going to try adding the coproc along with jar in Hdfs and
  see what
 happens.

 Thanks,
 Anil Gupta

 On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:

  I tried out a sample test class.  It is working properly.  I
  just
  

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Anoop Sam John

Anil
 Yes the same. You got the HTD from the master to your client code and just 
added the CP into that Object. In order to reflect the change in the HBase 
cluster you need to call the modifyTable API with your changed HTD. Master will 
change the table. When you enable back the table, regions will get opened in 
the RSs and will be having the CP with that then..  :)  Hope now I make it 
clear for you..

-Anoop-

From: anil gupta [anilgupt...@gmail.com]
Sent: Friday, October 19, 2012 11:01 AM
To: user@hbase.apache.org
Subject: Re: Unable to add co-processor to table through HBase api

Hi Guys,

Do you mean to say that i need to call the following method after the call
to addCoprocessor method:

public void *modifyTable*(byte[] tableName,
HTableDescriptor
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html
htd)
 throws IOException
http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?is-external=true


http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescriptor%29

Thanks,
Anil Gupta

On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 I can attach the code that I tried.  Here as the HTD is getting modified we
 may need to call modifyTable().
 My testclass did try this while doing creation of table itself.

 I will attach shortly.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Friday, October 19, 2012 10:29 AM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Anoop,
 
  Sorry, i am unable to understand what you mean by have to modify the
  table
  calling Admin API??. Am i missing some other calls in my code?
 
  Thanks,
  Anil Gupta
 
  On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com
  wrote:
 
  
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
  ssName,
 new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
   Coprocessor.PRIORITY_USER,map);
  
   Anil,
  
   Don't you have to modify the table calling Admin API??  !  Not
  seeing
   that code here...
  
   -Anoop-
  
   
   From: anil gupta [anilgupt...@gmail.com]
   Sent: Friday, October 19, 2012 2:46 AM
   To: user@hbase.apache.org
   Subject: Re: Unable to add co-processor to table through HBase api
  
   Hi Folks,
  
   Still, i am unable to add the co-processors through HBase client api.
  This
   time i tried loading the coprocessor by providing the jar path along
  with
   parameters. But, it failed.
   I was able to add the same coprocessor to the table through HBase
  shell.
   I also dont see any logs regarding adding coprocessors in
  regionservers
   when i try to add the co-processor through api.I strongly feel that
  HBase
   client api for adding coprocessor seems to be broken. Please let me
  know if
   the code below seems to be problematic.
  
   Here is the code i used to add the coprocessor through HBase api:
   private static void modifyTable() throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin hAdmin = new HBaseAdmin(conf);
   String tableName = txn;
   hAdmin.disableTable(tableName);
   if(!hAdmin.isTableEnabled(tableName))
   {
 System.out.println(Trying to add coproc to table); // using
  err so
   that it's easy to read this on eclipse console.
 HashMapString, String map = new HashMapString,String();
 map.put(arg1, batchdate);
 String className =
  
  com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver;
  
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas
  sName,
 new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
   Coprocessor.PRIORITY_USER,map);
  
 if(
  
  
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas
  sName)
 )
 {
   System.err.println(YIPIE!!!);
 }
 hAdmin.enableTable(tableName);
  
   }
   hAdmin.close();
  }
  
   Thanks,
   Anil Gupta
  
   On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
Do let me know if you are stuck up.  May be I did not get your
  actual
problem.
   
All the best.
   
Regards
Ram
   
 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Wednesday, October 17, 2012 11:34 PM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase
  api

 Hi Ram,

 The table exists and I don't get any error while running the
  program(i
 would get an error if the table did not exist). I am running a
 

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Ramkrishna.S.Vasudevan
Yes you are right. modifyTable has to be called.

public class TestClass { 
  private static HBaseTestingUtility UTIL = new HBaseTestingUtility(); 
  @BeforeClass 
  public static void setupBeforeClass() throws Exception { 
Configuration conf = UTIL.getConfiguration();   

  } 
  
  @Before 
  public void setUp() throws Exception{ 
UTIL.startMiniCluster(1); 
  } 
  
  @Test 
  public void testSampe() throws Exception{ 
HBaseAdmin admin = UTIL.getHBaseAdmin(); 
Configuration conf = UTIL.getConfiguration(); 
ZooKeeperWatcher zkw = HBaseTestingUtility.getZooKeeperWatcher(UTIL); 
String userTableName = testSampe; 
HTableDescriptor htd = new HTableDescriptor(userTableName); 
 
//htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserve
r); 
HColumnDescriptor hcd = new HColumnDescriptor(col); 
htd.addFamily(hcd); 
admin.createTable(htd); 
ZKAssign.blockUntilNoRIT(zkw); 

admin.disableTable(userTableName); 
 
htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserver
); 
admin.modifyTable(Bytes.toBytes(userTableName), htd); 
admin.enableTable(userTableName); 
HTable table = new HTable(conf, userTableName); 

HTableDescriptor tableDescriptor =
admin.getTableDescriptor(Bytes.toBytes(userTableName)); 
boolean hasCoprocessor =
tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.MockReg
ionObserver); 
System.out.println(hasCoprocessor); 



  } 
}

If you comment the modifyTable() you will not be able to see the coprocessor
added.
That's what I told in my previous reply itself like try doing this while
createTable itself.  If you want to add it later then its thro modify table
you can do because it involves changes the HTD.

Regards
Ram

 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 11:02 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase api
 
 Hi Guys,
 
 Do you mean to say that i need to call the following method after the
 call
 to addCoprocessor method:
 
 public void *modifyTable*(byte[] tableName,
 HTableDescriptor
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript
 or.html
 htd)
  throws IOException
 http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?
 is-external=true
 
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm
 in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto
 r%29
 
 Thanks,
 Anil Gupta
 
 On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:
 
  I can attach the code that I tried.  Here as the HTD is getting
 modified we
  may need to call modifyTable().
  My testclass did try this while doing creation of table itself.
 
  I will attach shortly.
 
  Regards
  Ram
 
   -Original Message-
   From: anil gupta [mailto:anilgupt...@gmail.com]
   Sent: Friday, October 19, 2012 10:29 AM
   To: user@hbase.apache.org
   Subject: Re: Unable to add co-processor to table through HBase api
  
   Hi Anoop,
  
   Sorry, i am unable to understand what you mean by have to modify
 the
   table
   calling Admin API??. Am i missing some other calls in my code?
  
   Thanks,
   Anil Gupta
  
   On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John
 anoo...@huawei.com
   wrote:
  
   
   
   
  
 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
   ssName,
  new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
Coprocessor.PRIORITY_USER,map);
   
Anil,
   
Don't you have to modify the table calling Admin API??  !
 Not
   seeing
that code here...
   
-Anoop-
   

From: anil gupta [anilgupt...@gmail.com]
Sent: Friday, October 19, 2012 2:46 AM
To: user@hbase.apache.org
Subject: Re: Unable to add co-processor to table through HBase
 api
   
Hi Folks,
   
Still, i am unable to add the co-processors through HBase client
 api.
   This
time i tried loading the coprocessor by providing the jar path
 along
   with
parameters. But, it failed.
I was able to add the same coprocessor to the table through HBase
   shell.
I also dont see any logs regarding adding coprocessors in
   regionservers
when i try to add the co-processor through api.I strongly feel
 that
   HBase
client api for adding coprocessor seems to be broken. Please let
 me
   know if
the code below seems to be problematic.
   
Here is the code i used to add the coprocessor through HBase api:
private static void modifyTable() throws IOException
{
Configuration conf = HBaseConfiguration.create();
HBaseAdmin hAdmin = new HBaseAdmin(conf);
String tableName = txn;
hAdmin.disableTable(tableName);
if(!hAdmin.isTableEnabled(tableName))
{
   

Re: Unable to add co-processor to table through HBase api

2012-10-18 Thread anil gupta
Thanks a lot Guys. I really appreciate you help. I'll try this change in
the morning and let you know the outcome.

@Ram: Actually, i was trying to add the coprocessor to a per-existing
table. I think yesterday you assumed that I am trying to add the
coprocessor while creating the table. That's why there was a confusion
between us.

On Thu, Oct 18, 2012 at 10:40 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Yes you are right. modifyTable has to be called.

 public class TestClass {
   private static HBaseTestingUtility UTIL = new HBaseTestingUtility();
   @BeforeClass
   public static void setupBeforeClass() throws Exception {
 Configuration conf = UTIL.getConfiguration();

   }

   @Before
   public void setUp() throws Exception{
 UTIL.startMiniCluster(1);
   }

   @Test
   public void testSampe() throws Exception{
 HBaseAdmin admin = UTIL.getHBaseAdmin();
 Configuration conf = UTIL.getConfiguration();
 ZooKeeperWatcher zkw = HBaseTestingUtility.getZooKeeperWatcher(UTIL);
 String userTableName = testSampe;
 HTableDescriptor htd = new HTableDescriptor(userTableName);


 //htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserve
 r);
 HColumnDescriptor hcd = new HColumnDescriptor(col);
 htd.addFamily(hcd);
 admin.createTable(htd);
 ZKAssign.blockUntilNoRIT(zkw);

 admin.disableTable(userTableName);


 htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserver
 );
 admin.modifyTable(Bytes.toBytes(userTableName), htd);
 admin.enableTable(userTableName);
 HTable table = new HTable(conf, userTableName);

 HTableDescriptor tableDescriptor =
 admin.getTableDescriptor(Bytes.toBytes(userTableName));
 boolean hasCoprocessor =

 tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.MockReg
 ionObserver);
 System.out.println(hasCoprocessor);



   }
 }

 If you comment the modifyTable() you will not be able to see the
 coprocessor
 added.
 That's what I told in my previous reply itself like try doing this while
 createTable itself.  If you want to add it later then its thro modify table
 you can do because it involves changes the HTD.

 Regards
 Ram

  -Original Message-
  From: anil gupta [mailto:anilgupt...@gmail.com]
  Sent: Friday, October 19, 2012 11:02 AM
  To: user@hbase.apache.org
  Subject: Re: Unable to add co-processor to table through HBase api
 
  Hi Guys,
 
  Do you mean to say that i need to call the following method after the
  call
  to addCoprocessor method:
 
  public void *modifyTable*(byte[] tableName,
  HTableDescriptor
  http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript
  or.html
  htd)
   throws IOException
  http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?
  is-external=true
 
 
  http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm
  in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto
  r%29
 
  Thanks,
  Anil Gupta
 
  On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   I can attach the code that I tried.  Here as the HTD is getting
  modified we
   may need to call modifyTable().
   My testclass did try this while doing creation of table itself.
  
   I will attach shortly.
  
   Regards
   Ram
  
-Original Message-
From: anil gupta [mailto:anilgupt...@gmail.com]
Sent: Friday, October 19, 2012 10:29 AM
To: user@hbase.apache.org
Subject: Re: Unable to add co-processor to table through HBase api
   
Hi Anoop,
   
Sorry, i am unable to understand what you mean by have to modify
  the
table
calling Admin API??. Am i missing some other calls in my code?
   
Thanks,
Anil Gupta
   
On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John
  anoo...@huawei.com
wrote:
   



   
  hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
ssName,
   new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
 Coprocessor.PRIORITY_USER,map);

 Anil,

 Don't you have to modify the table calling Admin API??  !
  Not
seeing
 that code here...

 -Anoop-

 
 From: anil gupta [anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 2:46 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase
  api

 Hi Folks,

 Still, i am unable to add the co-processors through HBase client
  api.
This
 time i tried loading the coprocessor by providing the jar path
  along
with
 parameters. But, it failed.
 I was able to add the same coprocessor to the table through HBase
shell.
 I also dont see any logs regarding adding coprocessors in
regionservers
 when i try to add the co-processor through api.I strongly feel
  that
HBase
  

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Ramkrishna.S.Vasudevan
Ok Anil.. Not a problem.. My intention was to just see if the api was
working during createtable so that it will help you.

Regards
Ram

 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 11:22 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase api
 
 Thanks a lot Guys. I really appreciate you help. I'll try this change
 in
 the morning and let you know the outcome.
 
 @Ram: Actually, i was trying to add the coprocessor to a per-existing
 table. I think yesterday you assumed that I am trying to add the
 coprocessor while creating the table. That's why there was a confusion
 between us.
 
 On Thu, Oct 18, 2012 at 10:40 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:
 
  Yes you are right. modifyTable has to be called.
 
  public class TestClass {
private static HBaseTestingUtility UTIL = new
 HBaseTestingUtility();
@BeforeClass
public static void setupBeforeClass() throws Exception {
  Configuration conf = UTIL.getConfiguration();
 
}
 
@Before
public void setUp() throws Exception{
  UTIL.startMiniCluster(1);
}
 
@Test
public void testSampe() throws Exception{
  HBaseAdmin admin = UTIL.getHBaseAdmin();
  Configuration conf = UTIL.getConfiguration();
  ZooKeeperWatcher zkw =
 HBaseTestingUtility.getZooKeeperWatcher(UTIL);
  String userTableName = testSampe;
  HTableDescriptor htd = new HTableDescriptor(userTableName);
 
 
 
 //htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionOb
 serve
  r);
  HColumnDescriptor hcd = new HColumnDescriptor(col);
  htd.addFamily(hcd);
  admin.createTable(htd);
  ZKAssign.blockUntilNoRIT(zkw);
 
  admin.disableTable(userTableName);
 
 
 
 htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObse
 rver
  );
  admin.modifyTable(Bytes.toBytes(userTableName), htd);
  admin.enableTable(userTableName);
  HTable table = new HTable(conf, userTableName);
 
  HTableDescriptor tableDescriptor =
  admin.getTableDescriptor(Bytes.toBytes(userTableName));
  boolean hasCoprocessor =
 
 
 tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.Mo
 ckReg
  ionObserver);
  System.out.println(hasCoprocessor);
 
 
 
}
  }
 
  If you comment the modifyTable() you will not be able to see the
  coprocessor
  added.
  That's what I told in my previous reply itself like try doing this
 while
  createTable itself.  If you want to add it later then its thro modify
 table
  you can do because it involves changes the HTD.
 
  Regards
  Ram
 
   -Original Message-
   From: anil gupta [mailto:anilgupt...@gmail.com]
   Sent: Friday, October 19, 2012 11:02 AM
   To: user@hbase.apache.org
   Subject: Re: Unable to add co-processor to table through HBase api
  
   Hi Guys,
  
   Do you mean to say that i need to call the following method after
 the
   call
   to addCoprocessor method:
  
   public void *modifyTable*(byte[] tableName,
   HTableDescriptor
  
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript
   or.html
   htd)
throws IOException
  
 http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?
   is-external=true
  
  
  
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm
  
 in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto
   r%29
  
   Thanks,
   Anil Gupta
  
   On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
I can attach the code that I tried.  Here as the HTD is getting
   modified we
may need to call modifyTable().
My testclass did try this while doing creation of table itself.
   
I will attach shortly.
   
Regards
Ram
   
 -Original Message-
 From: anil gupta [mailto:anilgupt...@gmail.com]
 Sent: Friday, October 19, 2012 10:29 AM
 To: user@hbase.apache.org
 Subject: Re: Unable to add co-processor to table through HBase
 api

 Hi Anoop,

 Sorry, i am unable to understand what you mean by have to
 modify
   the
 table
 calling Admin API??. Am i missing some other calls in my code?

 Thanks,
 Anil Gupta

 On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John
   anoo...@huawei.com
 wrote:

 
 
 

  
 hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla
 ssName,
new
 Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar),
  Coprocessor.PRIORITY_USER,map);
 
  Anil,
 
  Don't you have to modify the table calling Admin API??  !
   Not
 seeing
  that code here...
 
  -Anoop-
 
  
  From: anil gupta [anilgupt...@gmail.com]
  Sent: Friday, October 19, 2012 2:46 AM
  To: user@hbase.apache.org
  Subject: Re: