Re: Slow scanning for PrefixFilter on EncodedBlocks
+1 for making PrefixFIlter seek instead of using a startRow explicitly. ./zahoor On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl lhofha...@yahoo.com wrote: Oh yeah, I meant that one should always set the startrow as a matter of practice - if possible - and never rely on the filter alone. From: anil gupta anilgupt...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Wednesday, October 17, 2012 12:25 PM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Hi Lars, There is a specific use case for this: Table: Suppose i have a rowkey:customer_idevent_timestampuid Use case: I would like to get all the events of customer_id=123. Case 1: If i only use startRow=123 then i will get events of other customers having customers_id 123 since the scanner will be keep on fetching rows until the end of table. Case 2: If i use prefixFilter=123 and startRow=123 then i will get the correct result. IMHO, adding the feature of smartly adding the startRow in PrefixFilter wont hurt any existing functionality. Use of StartRow and PrefixFilter will still be different. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl lhofha...@yahoo.com wrote: That is a good point. There is no reason why prefix filter cannot issue a seek to the first KV for that prefix. Although it lead to a practice where people would the prefix filter when they in fact should just set the start row. - Original Message - From: anil gupta anilgupt...@gmail.com To: user@hbase.apache.org Cc: Sent: Wednesday, October 17, 2012 9:41 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Hi Zahoor, I heavily use prefix filter. Every time i have to explicitly define the startRow. So, that's the current behavior. However, initially this behavior was confusing to me also. I think that when a Prefix filter is defined then internally the startRow=prefix can be set. User defined StartRow takes precedence over the prefixFilter startRow. If the current prefixFilter can be modified in that way then it will eradicate this confusion regarding performance of prefix filter. Thanks, Anil Gupta On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor jmo...@gmail.com wrote: First i upgraded my cluster to 94.2.. even then the problem persisted.. Then i moved to using startRow instead of prefix filter.. ,/zahoor On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor jmo...@gmail.com wrote: Sorry for the delay. It looks like the problem is because of PrefixFilter... I assumed that i does a seek... If i use startRow instead.. it works fine.. But is it the correct approach? ./zahoor On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl lhofha...@yahoo.com wrote: I reopened HBASE-6577 - Original Message - From: lars hofhansl lhofha...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Tuesday, October 16, 2012 2:39 PM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Looks like this is exactly the scenario I was trying to optimize with HBASE-6577. Hmm... From: lars hofhansl lhofha...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 16, 2012 12:21 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks PrefixFilter does not do any seeking by itself, so I doubt this is related to HBASE-6757. Does this only happen with FAST_DIFF compression? If you can create an isolated test program (that sets up the scenario and then runs a scan with the filter such that it is very slow), I'm happy to take a look. -- Lars - Original Message - From: J Mohamed Zahoor jmo...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Monday, October 15, 2012 10:27 AM Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks Is this related to HBASE-6757 ? I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: Hi My scanner performance is very slow when using a Prefix filter on a **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). I am using 94.1 hbase. jstack shows that much time is spent on seeking the row. Even if i give a exact row key match in the prefix filter it takes about two minutes to return a single row. Running this multiple times also seems to be redirecting things to disk (loadBlock). at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) at
Re: crafting your key - scan vs. get
Neil, I've pointed you in the right direction. The rest of the exercise is left to the student. :-) While you used the comment about having fun, your question is boring. *^1 The fun part is for you now to play and see why I may have suggested the importance of column order. Sorry, but that really is the fun part of your question... figuring out the rest of the answer on your own. From your response, you clearly understand it, but you need to spend more time wrapping your head around the solution and taking ownership of it. Have fun, -Mike *^1 The reason I say that the question is boring is that once you fully understand the problem and the solution, you can easily apply it to other problems. The fun is in actually taking the time to experiment and work through the problem on your own. Seriously, that *is* the fun part. On Oct 17, 2012, at 10:53 PM, Neil Yalowitz neilyalow...@gmail.com wrote: This is a helpful response, thanks. Our use case fits the Show me the most recent events by user A you described. So using the first example, a table populated with events of user ID AA. ROWCOLUMN+CELL AA column=data:event, timestamp=1350420705459, value=myeventval1 AA column=data:event9998, timestamp=1350420704490, value=myeventval2 AA column=data:event9997, timestamp=1350420704567, value=myeventval3 NOTE1: I replaced the TS stuff with ...9997 for brevity, and the example user ID AA would actually be hashed to avoid hotspotting NOTE2: I assume I should shorten the chosen column family and qualifier before writing it to a large production table (for instance, d instead of data and e instead of event) I hope I have that right. Thanks for the response! As for including enough description for the question to be not-boring, I'm never quite sure when an email will grow so long that no one will read it. :) So to give more background: Each event is about 1KB of data. The frequency is highly variable... over any given period of time, some users may only log one event and no more, some users may log a few events (10 to 100), in some rare cases a user may log many events (1000+). The width of the column is some concern for the users with many events, but I'm thinking a few rare rows with 1KB x 1000+ width shouldn't kill us. If I may ask a couple of followup question about your comments: Then store each event in a separate column where the column name is something like event + (max Long - Time Stamp) . This will place the most recent event first. Although I know row keys are sorted, I'm not sure what this means for a qualifier. The scan result can depend on what cf:qual is used? ...and that determines which column value is first? Is this related to using setMaxResultsPerColumnFamily(1)? (ie-- only return one column value, so sort on qualifier and return the first val found) The reason I say event + the long, is that you may want to place user specific information in a column and you would want to make sure it was in front of the event data. Same question as above, I'm not sure what would place a column in front. Am I missing something? In the first case, you can use get() while still a scan, its a very efficient fetch. In the second, you will always need to do a scan. This is the core of my original question. My anecdotal tests in hbase shell showed a Get executing about 3x faster than a Scan with start/stoprow, but I don't trust my crude testing much and hoped someone could describe the performance trade-off between Scan vs. Get. Thanks again for anyone who read this far. Neil Yalowitz neilyalow...@gmail.com On Wed, Oct 17, 2012 at 10:45 AM, Michael Segel michael_se...@hotmail.comwrote: Neil, Since you asked Actually your question is kind of a boring question. ;-) [Note I will probably get flamed for saying it, even if it is the truth!] Having said that... Boring as it is, its an important topic that many still seem to trivialize in terms of its impact on performance. Before answering your question, lets take a step back and ask a more important question... What data do you to capture and store in HBase? and then ask yourself... How do I plan on accessing the data? From what I can tell, you want to track certain events made by a user. So you're recording at Time X, user A did something. Then the question is how do you want to access the data. Do you primarily say Show me all the events in the past 15 minutes and organize them by user? Or do you say Show me the most recent events by user A ? Here's the issue. If you are more interested and will frequently ask the question of Show me the most recent events by user A, Then you would want to do the following: Key = User ID (hashed if necessary) Column Family: Data (For lack of a better name) Then store each event in a
RE: Checking major compaction
Hi Yes Kiran you can go thro the logs also. You will see some logs like 'Start major compaction for .. 'Compacting file 'Compacting file And finally 'Completed major/minor compaction.' I just don have some exact logs with me right now. But you can see log msgs but all comes in debug mode. So ensure you enable debug mode for your logs. A simple test would be to just right some 10 rows. In between do some 4 to 5 flushes. Just give major_compact(tableName) from the shell. You can see the logs. :) Regards Ram -Original Message- From: kiran [mailto:kiran.sarvabho...@gmail.com] Sent: Thursday, October 18, 2012 12:03 PM To: user@hbase.apache.org Subject: Re: Checking major compaction Thanks ram, Is there a way can I check it through region server logs. If it is possible what are the statements that I need to look for ?? Thanks Kiran On Thu, Oct 18, 2012 at 11:55 AM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: HBASE-6033 does the work that you ask for. It is currently in Trunk version of HBase. Regards Ram -Original Message- From: kiran [mailto:kiran.sarvabho...@gmail.com] Sent: Thursday, October 18, 2012 11:43 AM To: user@hbase.apache.org Subject: Checking major compaction Hi all, Is there a way to check if major compaction is running or not on a table. -- Thank you Kiran Sarvabhotla -Even a correct decision is wrong when it is taken late -- Thank you Kiran Sarvabhotla -Even a correct decision is wrong when it is taken late
Re: hbase deployment using VMs for data nodes and SAN for data storage
Lars, I think we need to clarify what we think of as a SAN. Its possible to have a SAN where the disks appear as attached storage, while the traditional view is that the disks are detached. There are some design considerations like cluster density where one would want to use a SAN like NetApp to effectively create a storage half to a cluster and then a compute half that requires a fraction of the space and energy of a commodity built cluster. When we start to see clusters at PB scale, we have to consider the size of the footprint and the cost of operating them in terms of both energy efficiency and physical footprint in a data center. HBase can run in such configurations with the right tuning. I for one would love to have a data center where I can drop in different configurations and be able to tune and validate cluster designs, but alas that's something only a MapR, Cloudera, Hortonworks thing where they have the deep pockets and necessity to actually work through this for their customers. On Oct 15, 2012, at 11:43 PM, lars hofhansl lhofha...@yahoo.com wrote: If you have a SAN, why would you want to use HBase? -- Lars From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, October 15, 2012 3:00 PM Subject: hbase deployment using VMs for data nodes and SAN for data storage Hi We are deciding between using local disks for bare metal hosts Vs VMs using SAN for data storage. I was wondering if anyone has contrasted performance, availability and scalability between these two options? IMO, This is kinda similar to a typical AWS or another cloud deployment. Thanks, Abhishek
RE: Checking major compaction
A simple test would be to just right some 10 rows I meant to say write some 10 rows.(not right) Regards Ram -Original Message- From: Ramkrishna.S.Vasudevan [mailto:ramkrishna.vasude...@huawei.com] Sent: Thursday, October 18, 2012 2:05 PM To: user@hbase.apache.org Subject: RE: Checking major compaction Hi Yes Kiran you can go thro the logs also. You will see some logs like 'Start major compaction for .. 'Compacting file 'Compacting file And finally 'Completed major/minor compaction.' I just don have some exact logs with me right now. But you can see log msgs but all comes in debug mode. So ensure you enable debug mode for your logs. A simple test would be to just right some 10 rows. In between do some 4 to 5 flushes. Just give major_compact(tableName) from the shell. You can see the logs. :) Regards Ram -Original Message- From: kiran [mailto:kiran.sarvabho...@gmail.com] Sent: Thursday, October 18, 2012 12:03 PM To: user@hbase.apache.org Subject: Re: Checking major compaction Thanks ram, Is there a way can I check it through region server logs. If it is possible what are the statements that I need to look for ?? Thanks Kiran On Thu, Oct 18, 2012 at 11:55 AM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: HBASE-6033 does the work that you ask for. It is currently in Trunk version of HBase. Regards Ram -Original Message- From: kiran [mailto:kiran.sarvabho...@gmail.com] Sent: Thursday, October 18, 2012 11:43 AM To: user@hbase.apache.org Subject: Checking major compaction Hi all, Is there a way to check if major compaction is running or not on a table. -- Thank you Kiran Sarvabhotla -Even a correct decision is wrong when it is taken late -- Thank you Kiran Sarvabhotla -Even a correct decision is wrong when it is taken late
one RegionServer crashed and the whole cluster was blocked
Hi, All One of the RegionServer of our company’s cluster was crashed. At this time, I found: 1. All the RegionServer stopped handling the requests from the client side( requestsPerSecond=0 at the master-status UI page). 2. It takes about 12-15 minutes to recovery. 3. I have set hbase.regionserver.restart.on.zk.expire to true, but it does not work. For 1, I knew the cluster began to split log and recover the data on the crashed RegionServer, will the recovery operation block all the requests from the client side? For 2, Is there any solution to reduce the recovery time? For 3, I checked the log, found “session is timeout” exception, maybe for full gc and the session was timeout. But why the hbase.regionserver.restart.on.zk.expire does not work? My HBase version is 0.94.0. Thanks for any suggestions and feedback! Fowler Zhang
Re: Comparison of hbase/hadoop with sql server
What is the difference between HBase and Hadoop+HBase? HBase runs on top of Hadoop components. Also, first answer us this question, before we answer yours: Will your SQL Server scale linearly as you add more machines? Can it easily scale horizontally and vertically? Seems to me like you're comparing the wrong elements in deciding what platform to base your application on. If you could explain what you wish to do, and what data sizes you expect to work with, we can provide a better answer. On Thu, Oct 18, 2012 at 5:06 PM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi, Can anyone give the clear idea about these comparisons on same hardware software configuration. Sql server hbasehadoop+hbase data compression ? ?? (yes/no,if all yes where it is more effective) Online back ups? ? ? Security ? ? ? (which is more secure and more controllable) Batch Queries execution time? ? ? (where time consumption will b more for aggregates) Let me know if i need to consider any benefit of hadoop/hbase over sql server Thanks Regards Prabhjot -- Harsh J
RE: one RegionServer crashed and the whole cluster was blocked
For 1, I knew the cluster began to split log and recover the data on the crashed RegionServer, will the recovery operation block all the requests from the client side? Ideally should not. But if your client was generating data for the regions that were dead at that time then client requests willnot be served till the regions are online after Log splitting on some other region server. Any client requests going to other region servers should ideally be working. Did you see the threaddumps at that time on the other RS? That should give some clue. For 2, Is there any solution to reduce the recovery time? The recovery time depends on the amount of data and particularly on the size of the HLog file. By default every HLog file is of size 256MB. In 0.94.0 some good no of changes have gone in to make the recovery faster in terms of HLog Splitting. 3. I have set hbase.regionserver.restart.on.zk.expire to true, but it does not work. I am not very sure how the code works with this property. Will check this part. Regards Ram -Original Message- From: 张磊 [mailto:zhang...@youku.com] Sent: Thursday, October 18, 2012 5:01 PM To: user@hbase.apache.org Subject: one RegionServer crashed and the whole cluster was blocked Hi, All One of the RegionServer of our company’s cluster was crashed. At this time, I found: 1. All the RegionServer stopped handling the requests from the client side( requestsPerSecond=0 at the master-status UI page). 2. It takes about 12-15 minutes to recovery. 3. I have set hbase.regionserver.restart.on.zk.expire to true, but it does not work. For 1, I knew the cluster began to split log and recover the data on the crashed RegionServer, will the recovery operation block all the requests from the client side? For 2, Is there any solution to reduce the recovery time? For 3, I checked the log, found “session is timeout” exception, maybe for full gc and the session was timeout. But why the hbase.regionserver.restart.on.zk.expire does not work? My HBase version is 0.94.0. Thanks for any suggestions and feedback! Fowler Zhang
Re: Coprocessor end point vs MapReduce?
To echo what Mike said about KISS, would you use triggers for a large time-sensitive batch job in an RDBMS? It's possible, but probably not. Then you might want to think twice about using co-processors for such a purpose with HBase. On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote: Run your weekly job in a low priority fair scheduler/capacity scheduler queue. Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures. You need to restrain and use them sparingly otherwise you end up creating performance issues. Just IMHO. -Mike On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I don't have any concern about the time it's taking. It's more about the load it's putting on the cluster. I have other jobs that I need to run (secondary index, data processing, etc.). So the more time this new job is taking, the less CPU the others will have. I tried the M/R and I really liked the way it's done. So my only concern will really be the performance of the delete part. That's why I'm wondering what's the best practice to move a row to another table. 2012/10/17, Michael Segel michael_se...@hotmail.com: If you're going to be running this weekly, I would suggest that you stick with the M/R job. Is there any reason why you need to be worried about the time it takes to do the deletes? On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, I'm expecting to run the job weekly. I initially thought about using end points because I found HBASE-6942 which was a good example for my needs. I'm fine with the Put part for the Map/Reduce, but I'm not sure about the delete. That's why I look at coprocessors. Then I figure that I also can do the Put on the coprocessor side. On a M/R, can I delete the row I'm dealing with based on some criteria like timestamp? If I do that, I will not do bulk deletes, but I will delete the rows one by one, right? Which might be very slow. If in the future I want to run the job daily, might that be an issue? Or should I go with the initial idea of doing the Put with the M/R job and the delete with HBASE-6942? Thanks, JM 2012/10/17, Michael Segel michael_se...@hotmail.com: Hi, I'm a firm believer in KISS (Keep It Simple, Stupid) The Map/Reduce (map job only) is the simplest and least prone to failure. Not sure why you would want to do this using coprocessors. How often are you running this job? It sounds like its going to be sporadic. -Mike On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Can someone please help me to understand the pros and cons between those 2 options for the following usecase? I need to transfer all the rows between 2 timestamps to another table. My first idea was to run a MapReduce to map the rows and store them on another table, and then delete them using an end point coprocessor. But the more I look into it, the more I think the MapReduce is not a good idea and I should use a coprocessor instead. BUT... The MapReduce framework guarantee me that it will run against all the regions. I tried to stop a regionserver while the job was running. The region moved, and the MapReduce restarted the job from the new location. Will the coprocessor do the same thing? Also, I found the webconsole for the MapReduce with the number of jobs, the status, etc. Is there the same thing with the coprocessors? Are all coprocessors running at the same time on all regions, which mean we can have 100 of them running on a regionserver at a time? Or are they running like the MapReduce jobs based on some configured values? Thanks, JM
Re: one RegionServer crashed and the whole cluster was blocked
Hi, Some stuff below: On Thu, Oct 18, 2012 at 1:30 PM, 张磊 zhang...@youku.com wrote: Hi, All One of the RegionServer of our company’s cluster was crashed. At this time, I found: 1. All the RegionServer stopped handling the requests from the client side( requestsPerSecond=0 at the master-status UI page). 2. It takes about 12-15 minutes to recovery. 3. I have set hbase.regionserver.restart.on.zk.expire to true, but it does not work. For 1, I knew the cluster began to split log and recover the data on the crashed RegionServer, will the recovery operation block all the requests from the client side? No. But it's worth checking that the region server who died was not the one handling the .meta. region. If it's the case, it's could be an explanation (clients do have a cache, but for first time access to a region they go to the .meta. region first.) For 2, Is there any solution to reduce the recovery time? 12 minutes for a single region server crash (i.e. the datanode it still there, the cluster is ok) seems huge. You need to look at: - a possible root cause: if the region server got disconnected, it may be because the network or ZooKeeper was in the bad shape anyway. So the recovery is slow because the cause of the crash is still there. - how is your cluster? Do you have a a lot of regions to recover? Did you have a lot of writes on this region server? For 3, I checked the log, found “session is timeout” exception, maybe for full gc and the session was timeout. But why the hbase.regionserver.restart.on.zk.expire does not work? My HBase version is 0.94.0. I'm not sure it's still in the code base. To be checked. As well, you can have a root cause that makes the server stops. But there are two sides of a ZK disconnect anyway: 1) the region server: if it's disconnected but actually still there so it may decide to kill itself, or not. 2) the cluster: after the timeout, the timeouted regionserver is considered as dead and the recovery starts. This whatever what happens in 1). So whatever happens in 1) does not change much from a mttr point of view, except if your cluster is small, or if your loosing multiple nodes. There is an autorestart option in the 0.96 scripts. It changes nothing to the mttr itself, but cover more cases of regionserver crashes. See releases notes in HBASE-5939. Good luck, Nicolas
Re: ANN: HBase 0.94.2 is available for download
+1 on pushing to maven repo. Thanks. On Wed, Oct 17, 2012 at 1:49 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Thanks Jean for your update. Regards Ram -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, October 17, 2012 5:14 PM To: user@hbase.apache.org Subject: Re: ANN: HBase 0.94.2 is available for download Thanks. I tried to call some MR using the 0.94.2 jar on a 0.94.0 cluster and it's working fine. To install it on the cluster I have done a full install on the nodes and re-started them one by one. Seems it worked fine. The only issue was with the master since I don't have a secondary master. I was on 0.94.0 so HBASE-6710 had no impact to me. 2012/10/17, Stack st...@duboce.net: On Tue, Oct 16, 2012 at 7:00 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi St.Atck, Is the foll out upgrade process documented anywhere? I looked at the book by only find upgrade from 0.90 to 0.92. Can you point me to something? If there is no documentation yet, can someone draft the steps here so I can propose an update to the online book? Thanks Jean-Marc. You should be able to do a rolling restart from 0.92.x to 0.94.x. Its a bug if you can't. There is no entry in the reference guide but there should be if only to say this... You might want to also call out https://issues.apache.org/jira/browse/HBASE-6710. Folks should be conscious of its implications upgrading. Thanks boss, St.Ack
remote connection using HBase Java client
Hi, I have a standalone HBase 0.94.1 server running on my desktop. In the hbase-site.xml file, I just set hbase.rootdir. From my laptop, i want to connect to HBase server in my desktop. What should I change on my client and server HBase configuration files? Also, what should I change on /etc/hosts file for client and server? thank you, Erman
Re: ANN: HBase 0.94.2 is available for download
I'm on it. :) - Original Message - From: Amit Sela am...@infolinks.com To: user@hbase.apache.org Cc: Sent: Thursday, October 18, 2012 8:25 AM Subject: Re: ANN: HBase 0.94.2 is available for download +1 on pushing to maven repo. Thanks. On Wed, Oct 17, 2012 at 1:49 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Thanks Jean for your update. Regards Ram -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, October 17, 2012 5:14 PM To: user@hbase.apache.org Subject: Re: ANN: HBase 0.94.2 is available for download Thanks. I tried to call some MR using the 0.94.2 jar on a 0.94.0 cluster and it's working fine. To install it on the cluster I have done a full install on the nodes and re-started them one by one. Seems it worked fine. The only issue was with the master since I don't have a secondary master. I was on 0.94.0 so HBASE-6710 had no impact to me. 2012/10/17, Stack st...@duboce.net: On Tue, Oct 16, 2012 at 7:00 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi St.Atck, Is the foll out upgrade process documented anywhere? I looked at the book by only find upgrade from 0.90 to 0.92. Can you point me to something? If there is no documentation yet, can someone draft the steps here so I can propose an update to the online book? Thanks Jean-Marc. You should be able to do a rolling restart from 0.92.x to 0.94.x. Its a bug if you can't. There is no entry in the reference guide but there should be if only to say this... You might want to also call out https://issues.apache.org/jira/browse/HBASE-6710. Folks should be conscious of its implications upgrading. Thanks boss, St.Ack
High IPC Latency
Hello, We are seeing slow times for read operations in our experiments. We are hoping that you guys can help us figure out what's going wrong. Here are some details: - We are running a read-only benchmark on our HBase cluster. - - There are 10 regionservers, each co-located with a datanode. HDFS replication is 3x. - All the data read by the experiment is already in the block cache and the hit ratio is 99%. - - We have 10 clients, each with around 400 threads making a mix of read-only requests involving multi-gets and scans. - - We settled on the default client pool type/size (roundrobin/1) and a regionserver handler count of 100 after testing various combinations to see what setting worked best. - - Our scans are short, fetching around 10 rows on average. Scanner caching is set to 50. - An average row in a scan has either around 10 columns (small row) or around 200 columns (big row). - - Our multi-gets fetch around 200 rows on average. - An average row in a multi-get has around 10 columns. - Each column holds an integer (encoded into bytes). - - None of the machines involved reach CPU, memory, or IO saturation. In fact resource utilization stays quite low. - - Our statistics show that the average time for a scan, measured starting from the first scanner.next() call to the last one which returns a null, is around 2-3 seconds. - Since we use scanner caching, the major portion of this time (around 2 seconds) is spent on the first call to next(), while the remaining calls take a negligible amount of time. - Similarly, we see that a multi-get on average takes around 2 seconds. - A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we should look deeper into what is going on at the regionservers. We monitored the IPC calls during one of the experiments. Here is a sample of one regionserver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 contents=200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115499; Served: HRegionInterface#openScanner queueTime=0 processingTime=0 contents=1 Scan, 63 bytes 2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115503; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103230; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 Long, 1 Integer 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103234; Served: HRegionInterface#close queueTime=0 processingTime=0 contents=1 Long 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103232; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 Long, 1 Integer I have attached a larger chunk of the logs we collected for this experiment in case that helps. From the logs, we saw that the next() operation at the regionserver takes 1 millisecond or less; and a multi-get takes 10 ms on average. Yet the corresponding times we see at the client are orders of magnitude higher. Ping times between the machines are at most 1ms and we are not saturating the network. We would really appreciate some insights from you guys on this. Where do you suggest we focus our efforts in order to hunt down this bottleneck/contention? Thanks! Yousuf
Re: High IPC Latency
Also, what version of HBase/HDFS is this using? - Original Message - From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt Sent: Thursday, October 18, 2012 11:38 AM Subject: RE: High IPC Latency Is it sustained for the same client hitting the same region server OR does it get better for the same client-RS combination when run for longer duration? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Thursday, October 18, 2012 11:26 AM To: user@hbase.apache.org Cc: Ivan Brondino; Ricardo Vilaça Subject: High IPC Latency Hello, We are seeing slow times for read operations in our experiments. We are hoping that you guys can help us figure out what's going wrong. Here are some details: * We are running a read-only benchmark on our HBase cluster. * * There are 10 regionservers, each co-located with a datanode. HDFS replication is 3x. * All the data read by the experiment is already in the block cache and the hit ratio is 99%. * * We have 10 clients, each with around 400 threads making a mix of read-only requests involving multi-gets and scans. * * We settled on the default client pool type/size (roundrobin/1) and a regionserver handler count of 100 after testing various combinations to see what setting worked best. * * Our scans are short, fetching around 10 rows on average. Scanner caching is set to 50. * An average row in a scan has either around 10 columns (small row) or around 200 columns (big row). * * Our multi-gets fetch around 200 rows on average. * An average row in a multi-get has around 10 columns. * Each column holds an integer (encoded into bytes). * * None of the machines involved reach CPU, memory, or IO saturation. In fact resource utilization stays quite low. * * Our statistics show that the average time for a scan, measured starting from the first scanner.next() call to the last one which returns a null, is around 2-3 seconds. * Since we use scanner caching, the major portion of this time (around 2 seconds) is spent on the first call to next(), while the remaining calls take a negligible amount of time. * Similarly, we see that a multi-get on average takes around 2 seconds. * A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we should look deeper into what is going on at the regionservers. We monitored the IPC calls during one of the experiments. Here is a sample of one regionserver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 contents=200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115499; Served: HRegionInterface#openScanner queueTime=0 processingTime=0 contents=1 Scan, 63 bytes 2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115503; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103230; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 Long, 1 Integer 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103234; Served: HRegionInterface#close queueTime=0 processingTime=0 contents=1 Long 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103232; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 Long, 1 Integer I have attached a larger chunk of the logs we collected for this experiment in case that helps. From the logs, we saw that the next() operation at the regionserver takes 1 millisecond or less; and a multi-get takes 10 ms on average. Yet the corresponding times we see at the client are orders of magnitude higher. Ping times between the machines are at most 1ms and we are not saturating the network. We would really appreciate some insights from you guys on this. Where do you suggest we focus our efforts in order to hunt down this bottleneck/contention? Thanks! Yousuf
Re: Coprocessor end point vs MapReduce?
I agree with the concern and there isn't a ton of guidance on this area yet. On 10/18/12 2:01 PM, Michael Segel michael_se...@hotmail.com wrote: Doug, One thing that concerns me is that a lot of folks are gravitating to Coprocessors and may be using them for the wrong thing. Has anyone done any sort of research as to some of the limitations and negative impacts on using coprocessors? While I haven't really toyed with the idea of bulk deletes, periodic deletes is probably not a good use of coprocessors however using them to synchronize tables would be a valid use case. Thx -Mike On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com wrote: To echo what Mike said about KISS, would you use triggers for a large time-sensitive batch job in an RDBMS? It's possible, but probably not. Then you might want to think twice about using co-processors for such a purpose with HBase. On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote: Run your weekly job in a low priority fair scheduler/capacity scheduler queue. Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures. You need to restrain and use them sparingly otherwise you end up creating performance issues. Just IMHO. -Mike On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I don't have any concern about the time it's taking. It's more about the load it's putting on the cluster. I have other jobs that I need to run (secondary index, data processing, etc.). So the more time this new job is taking, the less CPU the others will have. I tried the M/R and I really liked the way it's done. So my only concern will really be the performance of the delete part. That's why I'm wondering what's the best practice to move a row to another table. 2012/10/17, Michael Segel michael_se...@hotmail.com: If you're going to be running this weekly, I would suggest that you stick with the M/R job. Is there any reason why you need to be worried about the time it takes to do the deletes? On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, I'm expecting to run the job weekly. I initially thought about using end points because I found HBASE-6942 which was a good example for my needs. I'm fine with the Put part for the Map/Reduce, but I'm not sure about the delete. That's why I look at coprocessors. Then I figure that I also can do the Put on the coprocessor side. On a M/R, can I delete the row I'm dealing with based on some criteria like timestamp? If I do that, I will not do bulk deletes, but I will delete the rows one by one, right? Which might be very slow. If in the future I want to run the job daily, might that be an issue? Or should I go with the initial idea of doing the Put with the M/R job and the delete with HBASE-6942? Thanks, JM 2012/10/17, Michael Segel michael_se...@hotmail.com: Hi, I'm a firm believer in KISS (Keep It Simple, Stupid) The Map/Reduce (map job only) is the simplest and least prone to failure. Not sure why you would want to do this using coprocessors. How often are you running this job? It sounds like its going to be sporadic. -Mike On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Can someone please help me to understand the pros and cons between those 2 options for the following usecase? I need to transfer all the rows between 2 timestamps to another table. My first idea was to run a MapReduce to map the rows and store them on another table, and then delete them using an end point coprocessor. But the more I look into it, the more I think the MapReduce is not a good idea and I should use a coprocessor instead. BUT... The MapReduce framework guarantee me that it will run against all the regions. I tried to stop a regionserver while the job was running. The region moved, and the MapReduce restarted the job from the new location. Will the coprocessor do the same thing? Also, I found the webconsole for the MapReduce with the number of jobs, the status, etc. Is there the same thing with the coprocessors? Are all coprocessors running at the same time on all regions, which mean we can have 100 of them running on a regionserver at a time? Or are they running like the MapReduce jobs based on some configured values? Thanks, JM
Re: High IPC Latency
Hi, Thank you for your questions guys. We are using HBase 0.92 with HDFS 1.0.1. The experiment lasts 15 minutes. The measurements stabilize in the first two minutes of the run. The data is distributed almost evenly across the regionservers so each client hits most of them over the course of the experiment. However, for the data we have, any given multi-get or scan should touch only one or at most two regions. The client caches the locations of the regionservers, so after a couple of minutes of the experiment running, it wouldn't need to re-visit ZooKeeper, I believe. Correct me if I am wrong please. Regards, Yousuf On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com wrote: Also, what version of HBase/HDFS is this using? - Original Message - From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt Sent: Thursday, October 18, 2012 11:38 AM Subject: RE: High IPC Latency Is it sustained for the same client hitting the same region server OR does it get better for the same client-RS combination when run for longer duration? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Thursday, October 18, 2012 11:26 AM To: user@hbase.apache.org Cc: Ivan Brondino; Ricardo Vilaça Subject: High IPC Latency Hello, We are seeing slow times for read operations in our experiments. We are hoping that you guys can help us figure out what's going wrong. Here are some details: * We are running a read-only benchmark on our HBase cluster. * * There are 10 regionservers, each co-located with a datanode. HDFS replication is 3x. * All the data read by the experiment is already in the block cache and the hit ratio is 99%. * * We have 10 clients, each with around 400 threads making a mix of read-only requests involving multi-gets and scans. * * We settled on the default client pool type/size (roundrobin/1) and a regionserver handler count of 100 after testing various combinations to see what setting worked best. * * Our scans are short, fetching around 10 rows on average. Scanner caching is set to 50. * An average row in a scan has either around 10 columns (small row) or around 200 columns (big row). * * Our multi-gets fetch around 200 rows on average. * An average row in a multi-get has around 10 columns. * Each column holds an integer (encoded into bytes). * * None of the machines involved reach CPU, memory, or IO saturation. In fact resource utilization stays quite low. * * Our statistics show that the average time for a scan, measured starting from the first scanner.next() call to the last one which returns a null, is around 2-3 seconds. * Since we use scanner caching, the major portion of this time (around 2 seconds) is spent on the first call to next(), while the remaining calls take a negligible amount of time. * Similarly, we see that a multi-get on average takes around 2 seconds. * A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we should look deeper into what is going on at the regionservers. We monitored the IPC calls during one of the experiments. Here is a sample of one regionserver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 contents=200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115499; Served: HRegionInterface#openScanner queueTime=0 processingTime=0 contents=1 Scan, 63 bytes 2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115503; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103230; Served: HRegionInterface#next queueTime=0 processingTime=0 contents=1 Long, 1 Integer 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #103234; Served: HRegionInterface#close queueTime=0 processingTime=0 contents=1 Long 2012-10-18 17:00:09,994
Re: Using filters in REST/stargate returns 204 (No content)
What does the HBase shell return if you try that scan programatically? On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh suresh.kum...@emc.comwrote: I have a HBase Java client which has a couple of filters and just work fine, I get the expected result. Here is the code: HTable table = new HTable(conf, apachelogs); Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); RegexStringComparator comp = new RegexStringComparator(ERROR x.); SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol), CompareOp.EQUAL, comp); filter.setFilterIfMissing(true); list.addFilter(filter); scan.setFilter(list); ResultScanner scanner = table.getScanner(scan); I startup the REST server, and use curl for the above functionality, I just base 64 encoded ERROR x.: curl -v -H Content-Type:text/xml -d @args.txt http://localhost:8080/apachelogs/scanner where args.txt is: Scanner filter { latestVersion:true, ifMissing:true, qualifier:pcol, family:mylog, op:EQUAL, type:SingleColumnValueFilter, comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty pe:RegexStringComparator} } /filter /Scanner which returns * About to connect() to localhost port 8080 (#0) * Trying 127.0.0.1... connected POST /apachelogs/scanner HTTP/1.1 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: localhost:8080 Accept: */* Content-Type:text/xml Content-Length: 318 * upload completely sent off: 318out of 318 bytes HTTP/1.1 201 Created Location: http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 Content-Length: 0 * Connection #0 to host localhost left intact * Closing connection #0 but curl -v http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 returns HTTP/1.1 204 No Content Any clues? Thanks, Suresh -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: crafting your key - scan vs. get
Hi Neil, Mike summed it up well, as usual. :) Your choices of where to describe this dimension of your data (a one-to-many between users and events) are: - one row per event - one row per user, with events as columns - one row per user, with events as versions on a single cell The first two are the best choices, since the third is sort of a perversion of the time dimension (it isn't one thing that's changing, it's many things over time), and might make things counter-intuitive when combined with deletes, compaction, etc. You can do it, but caveat emptor. :) Since you have in the 100s or 1000s of events per user, it's reasonable to use the 2nd (columns). And with 1k cell sizes, even extreme cases (thousands of events) won't kill you. That said, the main plus you get out of using columns over rows is ACID properties; you could get set all the stuff for a single user atomically if it's columns in a single row, but not if its separate rows. That's nice, but I'm guessing you probably don't need to do that, and instead would write out the events as they happen (i.e., you would rarely be doing PUTs for multiple events for the same user at the same time, right?). In theory, tall tables (the row-wise model) should have a slight performance advantage over wide tables (the column-wise model), all other things being equal; the shape of the data is nearly the same, but the row-wise version doesn't have to do any work preserving consistency. Your informal tests about GET vs SCAN perf seem a little suspect, since a GET is actually implemented as a one-row SCAN; but the devil's in the details, so if you see that happening repeatably with data that's otherwise identical, raise it up to the dev list and people should look at it. The key thing is to try it for yourself and see. :) Ian ps - Sorry Mike was rude to you in his response. Your question was well-phrased and not at all boring. Mike, you can explain all you want, but saying Your question is boring is straight up rude; please don't do that. From: Neil Yalowitz neilyalow...@gmail.commailto:neilyalow...@gmail.com Date: Tue, Oct 16, 2012 at 2:53 PM Subject: crafting your key - scan vs. get To: user@hbase.apache.orgmailto:user@hbase.apache.org Hopefully this is a fun question. :) Assume you could architect an HBase table from scratch and you were choosing between the following two key structures. 1) The first structure creates a unique row key for each PUT. The rows are events related to a user ID. There may be up to several hundred events for each user ID (probably not thousands, an average of perhaps ~100 events per user). Each key would be made unique with a reverse-order-timestamp or perhaps just random characters (we don't particularly care about using ROT for sorting newest here). key AA + some-unique-chars The table will look like this: key vals cf:mycfts --- AA... myval1 1350345600 AA... myval2 1350259200 AA... myval3 1350172800 Retrieving these values will use a Scan with startRow and stopRow. In hbase shell, it would look like: $ scan 'mytable',{STARTROW='AA', ENDROW='AA_'} 2) The second structure choice uses only the user ID as the key and relies on row versions to store all the events. For example: key vals cf:mycf ts - AAmyval1 1350345600 AAmyval2 1350259200 AAmyval3 1350172800 Retrieving these values will use a Get with VERSIONS = somebignumber. In hbase shell, it would look like: $ get 'mytable','AA',{COLUMN='cf:mycf', VERSIONS=999} ...although this probably violates a comment in the HBase documentation: It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are very dear to you because this will greatly increase StoreFile size. ...found here: http://hbase.apache.org/book/schema.versions.html So, are there any performance considerations between Scan vs. Get in this use case? Which choice would you go for? Neil Yalowitz neilyalow...@gmail.commailto:neilyalow...@gmail.com
RE: Using filters in REST/stargate returns 204 (No content)
When I run the Java code, it returns the valid rows which match the regex. I base64encoded the qulaifier and family fields as well, still empty result. Suresh -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Thursday, October 18, 2012 1:19 PM To: user@hbase.apache.org Subject: Re: Using filters in REST/stargate returns 204 (No content) What does the HBase shell return if you try that scan programatically? On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh suresh.kum...@emc.comwrote: I have a HBase Java client which has a couple of filters and just work fine, I get the expected result. Here is the code: HTable table = new HTable(conf, apachelogs); Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); RegexStringComparator comp = new RegexStringComparator(ERROR x.); SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol), CompareOp.EQUAL, comp); filter.setFilterIfMissing(true); list.addFilter(filter); scan.setFilter(list); ResultScanner scanner = table.getScanner(scan); I startup the REST server, and use curl for the above functionality, I just base 64 encoded ERROR x.: curl -v -H Content-Type:text/xml -d @args.txt http://localhost:8080/apachelogs/scanner where args.txt is: Scanner filter { latestVersion:true, ifMissing:true, qualifier:pcol, family:mylog, op:EQUAL, type:SingleColumnValueFilter, comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty pe:RegexStringComparator} } /filter /Scanner which returns * About to connect() to localhost port 8080 (#0) * Trying 127.0.0.1... connected POST /apachelogs/scanner HTTP/1.1 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: localhost:8080 Accept: */* Content-Type:text/xml Content-Length: 318 * upload completely sent off: 318out of 318 bytes HTTP/1.1 201 Created Location: http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 Content-Length: 0 * Connection #0 to host localhost left intact * Closing connection #0 but curl -v http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 returns HTTP/1.1 204 No Content Any clues? Thanks, Suresh -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Unable to add co-processor to table through HBase api
Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a distributed cluster. Tried following additional ways also: 1. I tried loading the AggregationImplementation coproc. 2. I also tried adding the coprocs while the table is enabled. Also had a look at the JUnit test cases and could not find any difference. I am going to try adding the coproc along with jar in Hdfs and see what happens. Thanks, Anil Gupta On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I tried out a sample test class. It is working properly. I just have a doubt whether you are doing the Htd.addCoprocessor() step before creating the table? Try that way hope it should work. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 4:05 AM To: user@hbase.apache.org Subject: Unable to add co-processor to table through HBase api Hi All, I would like to add a RegionObserver to a HBase table through HBase api. I don't want to put this RegionObserver as a user or system co- processor in hbase-site.xml since this is specific to a table. So, option of using hbase properties is out. I have already copied the jar file in the classpath of region server and restarted the cluster. Can any one point out the problem in following code for adding the co-processor to the table: private void modifyTable(String name) throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); hAdmin.disableTable(txn_subset); if(!hAdmin.isTableEnabled(txn_subset)) { System.err.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).addCoprocessor( com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver); if( hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).hasCoprocessor( com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(ihub_txn_subset); } hAdmin.close(); }* * -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: error when open hbase shell
On Wed, Oct 17, 2012 at 2:42 AM, hua xiang adam_...@yahoo.com wrote: Hi, when open hbase shell with hdfs,there is an error. but root user can. below is the error : [root@hadoop2 ~]# su - hdfs [hdfs@hadoop2 ~]$ id uid=494(hdfs) gid=502(hadoop) groups=502(hadoop) [hdfs@hadoop2 ~]$ hbase shell Error: Could not find or load main class org.jruby.Main [hdfs@hadoop2 ~]$ maybe profile problem? It is in your CLASSPATH? HBase is built? St.Ack
Thrift Python client with regex
I am using Thrift (0.8.0) to get scan column values from a table. This code returns all the values. columns = ['mylog'] scanner = client.scannerOpen('apachelogs','', columns) result = client.scannerGet(scanner) while result: printRow(result[0]) result = client.scannerGet(scanner) print Scanner finished client.scannerClose(scanner) The scannerOpen Python API says you can pass a regex in the column qualifier, so if I send: columns = ['mylog:suresh'], it should return all the values which has the string suresh right? I don't get any result. Thanks, Suresh
Re: Thrift Python client with regex
We had the same question earlier. Unfortunately the documentation is wrong on this account; scannerOpen resolves to either a call to scan.addFamily or scan.addColumn, and neither directly supports regex matching. Regex pattern matching against colquals is definitely supported on the Java side, so Thrift2 (0.94.0) is a possible solution, if you can upgrade. Another approach, depending on how large your rows are, would be to grab the full list of cols, filter via regex on the client side, and then specify explicitly in scannerOpen(). Norbert On Thu, Oct 18, 2012 at 7:48 PM, Kumar, Suresh suresh.kum...@emc.com wrote: I am using Thrift (0.8.0) to get scan column values from a table. This code returns all the values. columns = ['mylog'] scanner = client.scannerOpen('apachelogs','', columns) result = client.scannerGet(scanner) while result: printRow(result[0]) result = client.scannerGet(scanner) print Scanner finished client.scannerClose(scanner) The scannerOpen Python API says you can pass a regex in the column qualifier, so if I send: columns = ['mylog:suresh'], it should return all the values which has the string suresh right? I don't get any result. Thanks, Suresh
Re: WAL.Hlog vs. Hlog
On Thu, Oct 18, 2012 at 7:35 PM, Maoke fib...@gmail.com wrote: hi Stack and all, i noticed that the regionserver.Hlog is obsoleted by regionserver.wal.Hlog, from version 0.20.6 to 0.90+. what is the major difference between the two, in principle? what we should pay attention to when using the WAL.Hlog? Your best bet is reviewing the release notes for 0.90 and the issue that moved WAL, HBASE-1756 Refactor HLog. Going by the issue, the motivation was cleanup. St.Ack
Re: WAL.Hlog vs. Hlog
2012/10/19 Stack st...@duboce.net On Thu, Oct 18, 2012 at 7:35 PM, Maoke fib...@gmail.com wrote: hi Stack and all, i noticed that the regionserver.Hlog is obsoleted by regionserver.wal.Hlog, from version 0.20.6 to 0.90+. what is the major difference between the two, in principle? what we should pay attention to when using the WAL.Hlog? Your best bet is reviewing the release notes for 0.90 and the issue that moved WAL, HBASE-1756 Refactor HLog. Going by the issue, the motivation was cleanup. thanks a lot! i will read that ASAP. - maoke St.Ack
答复: hbase.client.scanner.timeout.period not being respected
Did you rebounce your server cluster ? Per HregionServer.java code : this.scannerLeaseTimeoutPeriod = conf.getInt(HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD); seems this parameter is used by server side as well I am not an expert on it, hope hopeful for you:) Best, Liang 发件人: Bai Shen [baishen.li...@gmail.com] 发送时间: 2012年10月18日 23:25 收件人: user@hbase.apache.org 主题: hbase.client.scanner.timeout.period not being respected I've set hbase.client.scanner.timeout.period on my client to 30, but I'm still getting errors showing that hbase is using the default value of 6. Any ideas why this is? Thanks.
RE: Coprocessor end point vs MapReduce?
A CP and Endpoints operates at a region level.. Any operation within one region we can perform using this.. I have seen in below use case that along with the delete there was a need for inserting data to some other table also.. Also this was kind of a periodic action.. I really doubt how the endpoints alone can be used here.. I also tend towards the MR.. The idea behind the bulk delete CP is simple. We have a use case of deleting a bulk of rows and this need to be online delete. I also have seen in the mailing list many people ask question regarding that... In all people were using scans and get the rowkeys to the client side and then doing the deletes.. Yes most of the time complaint was the slowness.. One bulk delete performance improvement was done in HBASE-6284.. Still thought we can do all the operation (scan+delete) in server side and we can make use of the endpoints here.. This will be much more faster and can be used for online bulk deletes.. -Anoop- From: Michael Segel [michael_se...@hotmail.com] Sent: Thursday, October 18, 2012 11:31 PM To: user@hbase.apache.org Subject: Re: Coprocessor end point vs MapReduce? Doug, One thing that concerns me is that a lot of folks are gravitating to Coprocessors and may be using them for the wrong thing. Has anyone done any sort of research as to some of the limitations and negative impacts on using coprocessors? While I haven't really toyed with the idea of bulk deletes, periodic deletes is probably not a good use of coprocessors however using them to synchronize tables would be a valid use case. Thx -Mike On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com wrote: To echo what Mike said about KISS, would you use triggers for a large time-sensitive batch job in an RDBMS? It's possible, but probably not. Then you might want to think twice about using co-processors for such a purpose with HBase. On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote: Run your weekly job in a low priority fair scheduler/capacity scheduler queue. Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures. You need to restrain and use them sparingly otherwise you end up creating performance issues. Just IMHO. -Mike On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I don't have any concern about the time it's taking. It's more about the load it's putting on the cluster. I have other jobs that I need to run (secondary index, data processing, etc.). So the more time this new job is taking, the less CPU the others will have. I tried the M/R and I really liked the way it's done. So my only concern will really be the performance of the delete part. That's why I'm wondering what's the best practice to move a row to another table. 2012/10/17, Michael Segel michael_se...@hotmail.com: If you're going to be running this weekly, I would suggest that you stick with the M/R job. Is there any reason why you need to be worried about the time it takes to do the deletes? On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, I'm expecting to run the job weekly. I initially thought about using end points because I found HBASE-6942 which was a good example for my needs. I'm fine with the Put part for the Map/Reduce, but I'm not sure about the delete. That's why I look at coprocessors. Then I figure that I also can do the Put on the coprocessor side. On a M/R, can I delete the row I'm dealing with based on some criteria like timestamp? If I do that, I will not do bulk deletes, but I will delete the rows one by one, right? Which might be very slow. If in the future I want to run the job daily, might that be an issue? Or should I go with the initial idea of doing the Put with the M/R job and the delete with HBASE-6942? Thanks, JM 2012/10/17, Michael Segel michael_se...@hotmail.com: Hi, I'm a firm believer in KISS (Keep It Simple, Stupid) The Map/Reduce (map job only) is the simplest and least prone to failure. Not sure why you would want to do this using coprocessors. How often are you running this job? It sounds like its going to be sporadic. -Mike On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Can someone please help me to understand the pros and cons between those 2 options for the following usecase? I need to transfer all the rows between 2 timestamps to another table. My first idea was to run a MapReduce to map the rows and store them on another table, and then delete them using an end point coprocessor. But the more I look into it, the more I think the MapReduce is not a good idea and I should use a coprocessor instead. BUT... The MapReduce framework guarantee me that it will run against all the regions. I tried to stop a regionserver
Re: Coprocessor end point vs MapReduce?
I might be little off here. If rows are moved to another table on weekly or daily basis, why not create per weekly or per day table. That way you need to copy and delete. Of course it will not work you are are selectively filtering between timestamps and clients have to have notion of multiple tables. 2012/10/18 Anoop Sam John anoo...@huawei.com A CP and Endpoints operates at a region level.. Any operation within one region we can perform using this.. I have seen in below use case that along with the delete there was a need for inserting data to some other table also.. Also this was kind of a periodic action.. I really doubt how the endpoints alone can be used here.. I also tend towards the MR.. The idea behind the bulk delete CP is simple. We have a use case of deleting a bulk of rows and this need to be online delete. I also have seen in the mailing list many people ask question regarding that... In all people were using scans and get the rowkeys to the client side and then doing the deletes.. Yes most of the time complaint was the slowness.. One bulk delete performance improvement was done in HBASE-6284.. Still thought we can do all the operation (scan+delete) in server side and we can make use of the endpoints here.. This will be much more faster and can be used for online bulk deletes.. -Anoop- From: Michael Segel [michael_se...@hotmail.com] Sent: Thursday, October 18, 2012 11:31 PM To: user@hbase.apache.org Subject: Re: Coprocessor end point vs MapReduce? Doug, One thing that concerns me is that a lot of folks are gravitating to Coprocessors and may be using them for the wrong thing. Has anyone done any sort of research as to some of the limitations and negative impacts on using coprocessors? While I haven't really toyed with the idea of bulk deletes, periodic deletes is probably not a good use of coprocessors however using them to synchronize tables would be a valid use case. Thx -Mike On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com wrote: To echo what Mike said about KISS, would you use triggers for a large time-sensitive batch job in an RDBMS? It's possible, but probably not. Then you might want to think twice about using co-processors for such a purpose with HBase. On 10/17/12 9:50 PM, Michael Segel michael_se...@hotmail.com wrote: Run your weekly job in a low priority fair scheduler/capacity scheduler queue. Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures. You need to restrain and use them sparingly otherwise you end up creating performance issues. Just IMHO. -Mike On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I don't have any concern about the time it's taking. It's more about the load it's putting on the cluster. I have other jobs that I need to run (secondary index, data processing, etc.). So the more time this new job is taking, the less CPU the others will have. I tried the M/R and I really liked the way it's done. So my only concern will really be the performance of the delete part. That's why I'm wondering what's the best practice to move a row to another table. 2012/10/17, Michael Segel michael_se...@hotmail.com: If you're going to be running this weekly, I would suggest that you stick with the M/R job. Is there any reason why you need to be worried about the time it takes to do the deletes? On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, I'm expecting to run the job weekly. I initially thought about using end points because I found HBASE-6942 which was a good example for my needs. I'm fine with the Put part for the Map/Reduce, but I'm not sure about the delete. That's why I look at coprocessors. Then I figure that I also can do the Put on the coprocessor side. On a M/R, can I delete the row I'm dealing with based on some criteria like timestamp? If I do that, I will not do bulk deletes, but I will delete the rows one by one, right? Which might be very slow. If in the future I want to run the job daily, might that be an issue? Or should I go with the initial idea of doing the Put with the M/R job and the delete with HBASE-6942? Thanks, JM 2012/10/17, Michael Segel michael_se...@hotmail.com: Hi, I'm a firm believer in KISS (Keep It Simple, Stupid) The Map/Reduce (map job only) is the simplest and least prone to failure. Not sure why you would want to do this using coprocessors. How often are you running this job? It sounds like its going to be sporadic. -Mike On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Can someone please help me to understand the pros and cons between those 2 options for the following
RE: High IPC Latency
Hi Yousuf The client caches the locations of the regionservers, so after a couple of minutes of the experiment running, it wouldn't need to re-visit ZooKeeper, I believe. Correct me if I am wrong please. Yes you are right. Regards Ram -Original Message- From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Friday, October 19, 2012 1:30 AM To: user@hbase.apache.org; lars hofhansl Cc: Ivan Brondino; Ricardo Vilaça Subject: Re: High IPC Latency Hi, Thank you for your questions guys. We are using HBase 0.92 with HDFS 1.0.1. The experiment lasts 15 minutes. The measurements stabilize in the first two minutes of the run. The data is distributed almost evenly across the regionservers so each client hits most of them over the course of the experiment. However, for the data we have, any given multi-get or scan should touch only one or at most two regions. The client caches the locations of the regionservers, so after a couple of minutes of the experiment running, it wouldn't need to re-visit ZooKeeper, I believe. Correct me if I am wrong please. Regards, Yousuf On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com wrote: Also, what version of HBase/HDFS is this using? - Original Message - From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt Sent: Thursday, October 18, 2012 11:38 AM Subject: RE: High IPC Latency Is it sustained for the same client hitting the same region server OR does it get better for the same client-RS combination when run for longer duration? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Thursday, October 18, 2012 11:26 AM To: user@hbase.apache.org Cc: Ivan Brondino; Ricardo Vilaça Subject: High IPC Latency Hello, We are seeing slow times for read operations in our experiments. We are hoping that you guys can help us figure out what's going wrong. Here are some details: * We are running a read-only benchmark on our HBase cluster. * * There are 10 regionservers, each co-located with a datanode. HDFS replication is 3x. * All the data read by the experiment is already in the block cache and the hit ratio is 99%. * * We have 10 clients, each with around 400 threads making a mix of read-only requests involving multi-gets and scans. * * We settled on the default client pool type/size (roundrobin/1) and a regionserver handler count of 100 after testing various combinations to see what setting worked best. * * Our scans are short, fetching around 10 rows on average. Scanner caching is set to 50. * An average row in a scan has either around 10 columns (small row) or around 200 columns (big row). * * Our multi-gets fetch around 200 rows on average. * An average row in a multi-get has around 10 columns. * Each column holds an integer (encoded into bytes). * * None of the machines involved reach CPU, memory, or IO saturation. In fact resource utilization stays quite low. * * Our statistics show that the average time for a scan, measured starting from the first scanner.next() call to the last one which returns a null, is around 2-3 seconds. * Since we use scanner caching, the major portion of this time (around 2 seconds) is spent on the first call to next(), while the remaining calls take a negligible amount of time. * Similarly, we see that a multi-get on average takes around 2 seconds. * A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we should look deeper into what is going on at the regionservers. We monitored the IPC calls during one of the experiments. Here is a sample of one regionserver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 contents=200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115499; Served:
RE: Unable to add co-processor to table through HBase api
hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a distributed cluster. Tried following additional ways also: 1. I tried loading the AggregationImplementation coproc. 2. I also tried adding the coprocs while the table is enabled. Also had a look at the JUnit test cases and could not find any difference. I am going to try adding the coproc along with jar in Hdfs and see what happens. Thanks, Anil Gupta On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I tried out a sample test class. It is working properly. I just have a doubt whether you are doing the Htd.addCoprocessor() step before creating the table? Try that way hope it should work. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 4:05 AM To: user@hbase.apache.org Subject: Unable to add co-processor to table through HBase api Hi All, I would like to add a RegionObserver to a HBase table through HBase api. I don't want to put this RegionObserver as a user or system co- processor in hbase-site.xml since this is specific to a table. So, option of using hbase properties is out. I have already copied the jar file in the classpath of region server and restarted the cluster. Can any one point out the problem in following code for adding the co-processor to the table: private void modifyTable(String name) throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); hAdmin.disableTable(txn_subset); if(!hAdmin.isTableEnabled(txn_subset)) { System.err.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).addCoprocessor( com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver); if( hAdmin.getTableDescriptor(Bytes.toBytes(txn_subset)).hasCoprocessor( com.intuit.hbase.poc.coprocessor.observer.IhubTxnRegionObserver) ) { System.err.println(YIPIE!!!); }
Re: Unable to add co-processor to table through HBase api
Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(className) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a distributed cluster. Tried following additional ways also: 1. I tried loading the AggregationImplementation coproc. 2. I also tried adding the coprocs while the table is enabled. Also had a look at the JUnit test cases and could not find any difference. I am going to try adding the coproc along with jar in Hdfs and see what happens. Thanks, Anil Gupta On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I tried out a sample test class. It is working properly. I just have a doubt whether you are doing the Htd.addCoprocessor() step before creating the table? Try that way hope it should work. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 4:05 AM To: user@hbase.apache.org Subject: Unable to add co-processor to table through HBase api Hi All, I would like to add a RegionObserver to a HBase table through HBase api. I don't want to put this RegionObserver as a user or system co- processor in hbase-site.xml since this is specific to a table. So, option of using hbase properties is out. I have already copied the jar file in the classpath of region server and restarted the cluster. Can any one point out the problem in following code for adding the co-processor to the table: private void modifyTable(String name) throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); hAdmin.disableTable(txn_subset); if(!hAdmin.isTableEnabled(txn_subset)) { System.err.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console.
Re: High IPC Latency
Can you reproduce this against a single, local region server? Any chance that you can try with the just released 0.94.2? I would love to debug this. If would be a tremendous help if you had a little test program that reproduces this against a single server, so that I can see what is going on. Thanks. -- Lars - Original Message - From: Yousuf Ahmad myahm...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt Sent: Thursday, October 18, 2012 12:59 PM Subject: Re: High IPC Latency Hi, Thank you for your questions guys. We are using HBase 0.92 with HDFS 1.0.1. The experiment lasts 15 minutes. The measurements stabilize in the first two minutes of the run. The data is distributed almost evenly across the regionservers so each client hits most of them over the course of the experiment. However, for the data we have, any given multi-get or scan should touch only one or at most two regions. The client caches the locations of the regionservers, so after a couple of minutes of the experiment running, it wouldn't need to re-visit ZooKeeper, I believe. Correct me if I am wrong please. Regards, Yousuf On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl lhofha...@yahoo.com wrote: Also, what version of HBase/HDFS is this using? - Original Message - From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Cc: Ivan Brondino ibrond...@fi.upm.es; Ricardo Vilaça rmvil...@di.uminho.pt Sent: Thursday, October 18, 2012 11:38 AM Subject: RE: High IPC Latency Is it sustained for the same client hitting the same region server OR does it get better for the same client-RS combination when run for longer duration? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Thursday, October 18, 2012 11:26 AM To: user@hbase.apache.org Cc: Ivan Brondino; Ricardo Vilaça Subject: High IPC Latency Hello, We are seeing slow times for read operations in our experiments. We are hoping that you guys can help us figure out what's going wrong. Here are some details: * We are running a read-only benchmark on our HBase cluster. * * There are 10 regionservers, each co-located with a datanode. HDFS replication is 3x. * All the data read by the experiment is already in the block cache and the hit ratio is 99%. * * We have 10 clients, each with around 400 threads making a mix of read-only requests involving multi-gets and scans. * * We settled on the default client pool type/size (roundrobin/1) and a regionserver handler count of 100 after testing various combinations to see what setting worked best. * * Our scans are short, fetching around 10 rows on average. Scanner caching is set to 50. * An average row in a scan has either around 10 columns (small row) or around 200 columns (big row). * * Our multi-gets fetch around 200 rows on average. * An average row in a multi-get has around 10 columns. * Each column holds an integer (encoded into bytes). * * None of the machines involved reach CPU, memory, or IO saturation. In fact resource utilization stays quite low. * * Our statistics show that the average time for a scan, measured starting from the first scanner.next() call to the last one which returns a null, is around 2-3 seconds. * Since we use scanner caching, the major portion of this time (around 2 seconds) is spent on the first call to next(), while the remaining calls take a negligible amount of time. * Similarly, we see that a multi-get on average takes around 2 seconds. * A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we should look deeper into what is going on at the regionservers. We monitored the IPC calls during one of the experiments. Here is a sample of one regionserver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115483; Served: HRegionInterface#get queueTime=0 processingTime=1 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115487; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115489; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #111421; Served: HRegionInterface#get queueTime=0 processingTime=0 contents=1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115497; Served: HRegionInterface#multi queueTime=0 processingTime=9 contents=200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call #115499; Served:
RE: Unable to add co-processor to table through HBase api
I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas sName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas sName) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a distributed cluster. Tried following additional ways also: 1. I tried loading the AggregationImplementation coproc. 2. I also tried adding the coprocs while the table is enabled. Also had a look at the JUnit test cases and could not find any difference. I am going to try adding the coproc along with jar in Hdfs and see what happens. Thanks, Anil Gupta On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I tried out a sample test class. It is working properly. I just have a doubt whether you are doing the Htd.addCoprocessor() step before creating the table? Try that way hope it should work. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 4:05 AM To: user@hbase.apache.org Subject: Unable to add co-processor to table through HBase api Hi All, I would like to add a RegionObserver to a HBase table through HBase api. I don't want to put this RegionObserver as a user or system co- processor in hbase-site.xml since this is specific to a table. So, option of using hbase properties is out. I have already copied the jar file in the classpath of region
Re: Unable to add co-processor to table through HBase api
Hi Guys, Do you mean to say that i need to call the following method after the call to addCoprocessor method: public void *modifyTable*(byte[] tableName, HTableDescriptor http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html htd) throws IOException http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?is-external=true http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescriptor%29 Thanks, Anil Gupta On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas sName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas sName) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a distributed cluster. Tried following additional ways also: 1. I tried loading the AggregationImplementation coproc. 2. I also tried adding the coprocs while the table is enabled. Also had a look at the JUnit test cases and could not find any difference. I am going to try adding the coproc along with jar in Hdfs and see what happens. Thanks, Anil Gupta On Tue, Oct 16, 2012 at 11:44 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I tried out a sample test class. It is working properly. I just
RE: Unable to add co-processor to table through HBase api
Anil Yes the same. You got the HTD from the master to your client code and just added the CP into that Object. In order to reflect the change in the HBase cluster you need to call the modifyTable API with your changed HTD. Master will change the table. When you enable back the table, regions will get opened in the RSs and will be having the CP with that then.. :) Hope now I make it clear for you.. -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 11:01 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Guys, Do you mean to say that i need to call the following method after the call to addCoprocessor method: public void *modifyTable*(byte[] tableName, HTableDescriptor http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html htd) throws IOException http://download.oracle.com/javase/6/docs/api/java/io/IOException.html?is-external=true http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescriptor%29 Thanks, Anil Gupta On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) { System.out.println(Trying to add coproc to table); // using err so that it's easy to read this on eclipse console. HashMapString, String map = new HashMapString,String(); map.put(arg1, batchdate); String className = com.intuit.ihub.hbase.poc.coprocessor.observer.IhubTxnRegionObserver; hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(clas sName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); if( hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).hasCoprocessor(clas sName) ) { System.err.println(YIPIE!!!); } hAdmin.enableTable(tableName); } hAdmin.close(); } Thanks, Anil Gupta On Wed, Oct 17, 2012 at 9:27 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Do let me know if you are stuck up. May be I did not get your actual problem. All the best. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Wednesday, October 17, 2012 11:34 PM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Ram, The table exists and I don't get any error while running the program(i would get an error if the table did not exist). I am running a
RE: Unable to add co-processor to table through HBase api
Yes you are right. modifyTable has to be called. public class TestClass { private static HBaseTestingUtility UTIL = new HBaseTestingUtility(); @BeforeClass public static void setupBeforeClass() throws Exception { Configuration conf = UTIL.getConfiguration(); } @Before public void setUp() throws Exception{ UTIL.startMiniCluster(1); } @Test public void testSampe() throws Exception{ HBaseAdmin admin = UTIL.getHBaseAdmin(); Configuration conf = UTIL.getConfiguration(); ZooKeeperWatcher zkw = HBaseTestingUtility.getZooKeeperWatcher(UTIL); String userTableName = testSampe; HTableDescriptor htd = new HTableDescriptor(userTableName); //htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserve r); HColumnDescriptor hcd = new HColumnDescriptor(col); htd.addFamily(hcd); admin.createTable(htd); ZKAssign.blockUntilNoRIT(zkw); admin.disableTable(userTableName); htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserver ); admin.modifyTable(Bytes.toBytes(userTableName), htd); admin.enableTable(userTableName); HTable table = new HTable(conf, userTableName); HTableDescriptor tableDescriptor = admin.getTableDescriptor(Bytes.toBytes(userTableName)); boolean hasCoprocessor = tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.MockReg ionObserver); System.out.println(hasCoprocessor); } } If you comment the modifyTable() you will not be able to see the coprocessor added. That's what I told in my previous reply itself like try doing this while createTable itself. If you want to add it later then its thro modify table you can do because it involves changes the HTD. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 11:02 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Guys, Do you mean to say that i need to call the following method after the call to addCoprocessor method: public void *modifyTable*(byte[] tableName, HTableDescriptor http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript or.html htd) throws IOException http://download.oracle.com/javase/6/docs/api/java/io/IOException.html? is-external=true http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto r%29 Thanks, Anil Gupta On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase client api for adding coprocessor seems to be broken. Please let me know if the code below seems to be problematic. Here is the code i used to add the coprocessor through HBase api: private static void modifyTable() throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin hAdmin = new HBaseAdmin(conf); String tableName = txn; hAdmin.disableTable(tableName); if(!hAdmin.isTableEnabled(tableName)) {
Re: Unable to add co-processor to table through HBase api
Thanks a lot Guys. I really appreciate you help. I'll try this change in the morning and let you know the outcome. @Ram: Actually, i was trying to add the coprocessor to a per-existing table. I think yesterday you assumed that I am trying to add the coprocessor while creating the table. That's why there was a confusion between us. On Thu, Oct 18, 2012 at 10:40 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Yes you are right. modifyTable has to be called. public class TestClass { private static HBaseTestingUtility UTIL = new HBaseTestingUtility(); @BeforeClass public static void setupBeforeClass() throws Exception { Configuration conf = UTIL.getConfiguration(); } @Before public void setUp() throws Exception{ UTIL.startMiniCluster(1); } @Test public void testSampe() throws Exception{ HBaseAdmin admin = UTIL.getHBaseAdmin(); Configuration conf = UTIL.getConfiguration(); ZooKeeperWatcher zkw = HBaseTestingUtility.getZooKeeperWatcher(UTIL); String userTableName = testSampe; HTableDescriptor htd = new HTableDescriptor(userTableName); //htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserve r); HColumnDescriptor hcd = new HColumnDescriptor(col); htd.addFamily(hcd); admin.createTable(htd); ZKAssign.blockUntilNoRIT(zkw); admin.disableTable(userTableName); htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObserver ); admin.modifyTable(Bytes.toBytes(userTableName), htd); admin.enableTable(userTableName); HTable table = new HTable(conf, userTableName); HTableDescriptor tableDescriptor = admin.getTableDescriptor(Bytes.toBytes(userTableName)); boolean hasCoprocessor = tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.MockReg ionObserver); System.out.println(hasCoprocessor); } } If you comment the modifyTable() you will not be able to see the coprocessor added. That's what I told in my previous reply itself like try doing this while createTable itself. If you want to add it later then its thro modify table you can do because it involves changes the HTD. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 11:02 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Guys, Do you mean to say that i need to call the following method after the call to addCoprocessor method: public void *modifyTable*(byte[] tableName, HTableDescriptor http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript or.html htd) throws IOException http://download.oracle.com/javase/6/docs/api/java/io/IOException.html? is-external=true http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto r%29 Thanks, Anil Gupta On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Folks, Still, i am unable to add the co-processors through HBase client api. This time i tried loading the coprocessor by providing the jar path along with parameters. But, it failed. I was able to add the same coprocessor to the table through HBase shell. I also dont see any logs regarding adding coprocessors in regionservers when i try to add the co-processor through api.I strongly feel that HBase
RE: Unable to add co-processor to table through HBase api
Ok Anil.. Not a problem.. My intention was to just see if the api was working during createtable so that it will help you. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 11:22 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Thanks a lot Guys. I really appreciate you help. I'll try this change in the morning and let you know the outcome. @Ram: Actually, i was trying to add the coprocessor to a per-existing table. I think yesterday you assumed that I am trying to add the coprocessor while creating the table. That's why there was a confusion between us. On Thu, Oct 18, 2012 at 10:40 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Yes you are right. modifyTable has to be called. public class TestClass { private static HBaseTestingUtility UTIL = new HBaseTestingUtility(); @BeforeClass public static void setupBeforeClass() throws Exception { Configuration conf = UTIL.getConfiguration(); } @Before public void setUp() throws Exception{ UTIL.startMiniCluster(1); } @Test public void testSampe() throws Exception{ HBaseAdmin admin = UTIL.getHBaseAdmin(); Configuration conf = UTIL.getConfiguration(); ZooKeeperWatcher zkw = HBaseTestingUtility.getZooKeeperWatcher(UTIL); String userTableName = testSampe; HTableDescriptor htd = new HTableDescriptor(userTableName); //htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionOb serve r); HColumnDescriptor hcd = new HColumnDescriptor(col); htd.addFamily(hcd); admin.createTable(htd); ZKAssign.blockUntilNoRIT(zkw); admin.disableTable(userTableName); htd.addCoprocessor(org.apache.hadoop.hbase.regionserver.MockRegionObse rver ); admin.modifyTable(Bytes.toBytes(userTableName), htd); admin.enableTable(userTableName); HTable table = new HTable(conf, userTableName); HTableDescriptor tableDescriptor = admin.getTableDescriptor(Bytes.toBytes(userTableName)); boolean hasCoprocessor = tableDescriptor.hasCoprocessor(org.apache.hadoop.hbase.regionserver.Mo ckReg ionObserver); System.out.println(hasCoprocessor); } } If you comment the modifyTable() you will not be able to see the coprocessor added. That's what I told in my previous reply itself like try doing this while createTable itself. If you want to add it later then its thro modify table you can do because it involves changes the HTD. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 11:02 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Guys, Do you mean to say that i need to call the following method after the call to addCoprocessor method: public void *modifyTable*(byte[] tableName, HTableDescriptor http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescript or.html htd) throws IOException http://download.oracle.com/javase/6/docs/api/java/io/IOException.html? is-external=true http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdm in.html#modifyTable%28byte[],%20org.apache.hadoop.hbase.HTableDescripto r%29 Thanks, Anil Gupta On Thu, Oct 18, 2012 at 10:23 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: I can attach the code that I tried. Here as the HTD is getting modified we may need to call modifyTable(). My testclass did try this while doing creation of table itself. I will attach shortly. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 19, 2012 10:29 AM To: user@hbase.apache.org Subject: Re: Unable to add co-processor to table through HBase api Hi Anoop, Sorry, i am unable to understand what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla ssName, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, October 19, 2012 2:46 AM To: user@hbase.apache.org Subject: Re: