Re: RowLocks
Maybe I should illustrate with a specific usecase. I want to create a unique row with a rowkey that look like this: id+timestamp The timestamp (in millis) is provided by the user. It is ONLY the id that dictate uniqueness, not the timestamp. So there is a race condition here. My reasoning is as follows. 1) Lock the id (without the timestamp). 2.1) If the lock was aquired. Scan for the row. 2.2) If the row was not found, create it and set a counter on the id+timestamp row to 0. 2.3) If the row is found, increase a counter on the id+timestamp row. 2.4) Release the lock. If the lock was NOT aquired. 3.1) Not really sure how to proceed here. The easiest way is probably to wait until the lock is released (like a SELECT FOR UPDATE). Retries would also work. It is very important that we do not loose the increments or create multiple rows with same id with different timestamps when there are race conditions. On Thu, Aug 29, 2013 at 6:22 AM, lars hofhansl la...@apache.org wrote: Specifically the API has been removed because it had never actually worked correctly. Rowlocks are used by RegionServers for intra-region operations. As such they are ephemeral, in-memory constructs, that cannot reliably outlive a single RPC request. The HTable rowlock API allowed you to create a rowlocks and hold it over multiple RPCs, which would break if f.e. a region is moved or split. -- Lars From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 8:01 PM Subject: Re: RowLocks The API is no longer a public API Thanks On Wed, Aug 28, 2013 at 7:58 PM, Michael Segel michael_se...@hotmail.com wrote: Ted, Can you clarify... Do you mean the API is no longer a public API, or do you mean no more RLL for atomic writes? On Aug 28, 2013, at 5:18 PM, Ted Yu yuzhih...@gmail.com wrote: RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.com wrote: Hi About the internals of locking a row in hbase. Does hbase row locks map one-to-one with a locks in zookeeper or are there any optimizations based on the fact that a row only exist on a single machine? Cheers, -Kristoffer The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
issue debug hbase
Hi all I use maven complie hbase src hbase version is 0.94 I can debug hbase for example create table as Java Application(i set breakpoint )that is so nice to learn hbase how to create table but i found i cannot debug hbase as remote java application when i breakpoint to src(in my local client) the programming exec but breakpoint do not anything The problem : i don't understand why i can debug create table as java application (set breakpoint in my local create table code )but when i use hbase shell create 'demo','s' my eclipse do not anything.(set breakpoint in my local hbase/src/) ps: i configuration remote java application host:port etc thank your for you help -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
issue debug hbase
Hi all I use maven complie hbase src hbase version is 0.94 I can debug hbase for example create table as Java Application(i set breakpoint )that is so nice to learn hbase how to create table but i found i cannot debug hbase as remote java application when i breakpoint to src(in my local client) the programming exec but breakpoint do not anything The problem : i don't understand why i can debug create table as java application (set breakpoint in my local create table code )but when i use hbase shell create 'demo','s' my eclipse do not anything.(set breakpoint in my local hbase/src/) ps: i configuration remote java application host:port etc thank your for you help -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
Hbase RowKey design schema
I am using HBase to store webtable content like how google is using bigtable. For reference of google bigtable My question is on RowKey, how we should be forming it. What google is doing is saving the URL in a reverse order as you can see in the PDF document com.cnn.www so that all the links associated with cnn.com will be manages in same block of GFS which will be lot easier to scan. I can use the same thing as google is using but wont it will be cool if I use some algorithm to compress the url For eg. RewKey | Google Bigtable | Algorithm output www.cnn.com/index.php| com.cnn.www/index.php | 12as/435 www.cnn.com/news/business/index.html | com.cnn.www/news/business/index.html | 12as/2as/dcx/asd www.cnn.com/news/sports/index.html | com.cnn.www/news/sports/index.html | 12as/2as/eds/scf Reason behind doing this is rowkey will be shorter as per the Hbase design schema (Mentioned in topic 6.3.2.3. Rowkey Length). So what do I need from you guys is to know am I correct over here Also if I am correct what Algorithm I should using. I am using python over thrift as a programming language so code will be overwhelming for me...
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable
Hi all, I am trying to write a MR code to load a HBase table. I have a mapper that emits (null,put object) and I am using TableMapReduceUtil.initTableReducerJob() to write it into a HBase table. Following is my code snippet public class MYHBaseLoader extends MapperNullWritable,BytesWritable,NullWritable,Put { protected void map (LongWritable key, BytesWritable value, Context context) throws IOException, InterruptedException { /- Some processing here.. Create put object and pushing it into Put object). context.write(null, put);// Pushing the put object. } public static void main (String args[]) throws IOException, ClassNotFoundException, InterruptedException{ Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MYHBaseLoader.class); job.setMapperClass(MYHBaseLoader.class); TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(Put.class); job.setInputFormatClass(SequenceFileInputFormat.class); //job.setNumReduceTasks(0); FileInputFormat.setInputPaths(job, new Path(test)); Path outputPath = new Path(test_output); FileOutputFormat.setOutputPath(job,outputPath); //outputPath.getFileSystem(conf).delete(outputPath, true); job.waitForCompletion(true); System.out.println(Done); } I am getting the following error while running. Any help/guidance: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:249) Regards Praveenesh
Re: experiencing high latency for few reads in HBase
Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple clients.) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:20 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get. We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients). Thanks, Saurabh. On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Right 4 sec is good. @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ? BTW, in this stress test how many concurrent clients do you have ? Regards, - kiru From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 12:15 PM Subject: RE: experiencing high latency for few reads in HBase 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 5:10 AM To: user@hbase.apache.org Subject: experiencing high latency for few reads in HBase Hi, We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read? We observe the following things - 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized. 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of
Re: how to export data from hbase to mysql?
My 2 cents : 1- Map your table to a Hive table and do the export using Sqoop. 2- Export http://hbase.apache.org/book/ops_mgt.html#export the table to a file first, and then export it using Sqoop. Warm Regards, Tariq cloudfront.blogspot.com On Wed, Aug 28, 2013 at 7:12 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Taking what Ravi Kiran mentioned a level higher, you can also use Pig. It has DBStorage. Very easy to rad from HBase and dump to MySQL if your data porting does not require complex transformation (even which can be handled in Pig too.) http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/DBStorage.html Regards, Shahab On Wed, Aug 28, 2013 at 1:26 AM, Ravi Kiran maghamraviki...@gmail.com wrote: If you would like to have greater control on what data / which columns from HBase should be going into MySQL tables , you can write a simple MR job and use the DBOutputFormat . It is a simple one and works great for us. Regards Ravi On Wed, Aug 28, 2013 at 10:42 AM, ch huang justlo...@gmail.com wrote: i use hive ,maybe it's a way ,let me try it On Wed, Aug 28, 2013 at 11:21 AM, James Taylor jtay...@salesforce.com wrote: Or if you'd like to be able to use SQL directly on it, take a look at Phoenix (https://github.com/forcedotcom/phoenix). James On Aug 27, 2013, at 8:14 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Take a look at sqoop? Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit : hi,all: any good idea? thanks
Re: RowLocks
Thanks for the update. Actually they worked ok for what they were. IMHO they should never had been made public because they aren't RLL that people think of as part of transactions and isolation levels found in RDBMSs. Had me worried there for a sec... Thx On Aug 28, 2013, at 11:22 PM, lars hofhansl la...@apache.org wrote: Specifically the API has been removed because it had never actually worked correctly. Rowlocks are used by RegionServers for intra-region operations. As such they are ephemeral, in-memory constructs, that cannot reliably outlive a single RPC request. The HTable rowlock API allowed you to create a rowlocks and hold it over multiple RPCs, which would break if f.e. a region is moved or split. -- Lars From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 8:01 PM Subject: Re: RowLocks The API is no longer a public API Thanks On Wed, Aug 28, 2013 at 7:58 PM, Michael Segel michael_se...@hotmail.comwrote: Ted, Can you clarify... Do you mean the API is no longer a public API, or do you mean no more RLL for atomic writes? On Aug 28, 2013, at 5:18 PM, Ted Yu yuzhih...@gmail.com wrote: RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.com wrote: Hi About the internals of locking a row in hbase. Does hbase row locks map one-to-one with a locks in zookeeper or are there any optimizations based on the fact that a row only exist on a single machine? Cheers, -Kristoffer The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
Re: experiencing high latency for few reads in HBase
In 0.94.11 Release, has been included an optimization for MultiGets: https://issues.apache.org/jira/browse/HBASE-9087 What version have you deployed? On 08/29/2013 01:29 AM, lars hofhansl wrote: A 1s SLA is tough in HBase (or any large memory JVM application). Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile. I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that. -- Lars - Original Message - From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:17 PM Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, Thanks for your response. 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec. We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size? 2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server? 3. We will change the cf to IN_memory and report back performance difference. Thanks, Saurabh. On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 5:10 AM To: user@hbase.apache.org Subject: experiencing high latency for few reads in HBase Hi, We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read? We observe the following things - 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized. 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction. Questions to experts - 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache? 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb. Any help is highly appreciable, Thanks, Saurabh. Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
HBase client with security
Hi all, I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with security. HBase works if I launch the shell from the node running the master, but I'd like to use it from an external machine. I prepared one, copying the Hadoop and HBase installation folders and adapting the path (indeed I can use the same client to run MR jobs and interact with HDFS). Regarding HBase client configuration: - hbase-site.xml specifies property namehbase.security.authentication/name valuekerberos/value /property property namehbase.rpc.engine/name valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value /property property namehbase.zookeeper.quorum/name valuemaster.hadoop.local,host49.hadoop.local/value /property where the zookeeper hosts are reachable and can be solved via DNS. I had to specify them otherwise the shell complains about org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid - I have a keytab for the principal I want to use (user running hbase/my client hostname@MYREALM), correctly addressed by the file hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to zk-jaas.conf. Nonetheless, when I issue a command from a HBase shell on the client machine, I got an error in the HBase master log 2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 6: readAndProcess threw exception org.apache.hadoop.security.AccessControlException: Authentication is required. Count of bytes read: 0 org.apache.hadoop.security.AccessControlException: Authentication is required at org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) It looks like there's a mismatch between the client and the master regarding the authentication mechanism. Note that from the same client machine I can launch and use a Zookeeper shell. What am I missing in the client configuration? Does /etc/krb5.conf play any role into this? Thanks, Matteo Matteo Lanati Distributed Resources Group Leibniz-Rechenzentrum (LRZ) Boltzmannstrasse 1 85748 Garching b. München (Germany) Phone: +49 89 35831 8724
Re: Hbase RowKey design schema
What advantage you will be gaining by compressing? Less space? But then it will add compression/decompression performance overhead. A trade-off but a especially significant as space is cheap and redundancy is OK with such data stores. Having said that, more importantly, what are your read use-cases or access patterns? That should drive your decision about row key design. Regards, Shahab On Thu, Aug 29, 2013 at 5:21 AM, Wasim Karani wa...@userworkstech.comwrote: I am using HBase to store webtable content like how google is using bigtable. For reference of google bigtable My question is on RowKey, how we should be forming it. What google is doing is saving the URL in a reverse order as you can see in the PDF document com.cnn.www so that all the links associated with cnn.com will be manages in same block of GFS which will be lot easier to scan. I can use the same thing as google is using but wont it will be cool if I use some algorithm to compress the url For eg. RewKey | Google Bigtable | Algorithm output www.cnn.com/index.php| com.cnn.www/index.php | 12as/435 www.cnn.com/news/business/index.html | com.cnn.www/news/business/index.html | 12as/2as/dcx/asd www.cnn.com/news/sports/index.html | com.cnn.www/news/sports/index.html | 12as/2as/eds/scf Reason behind doing this is rowkey will be shorter as per the Hbase design schema (Mentioned in topic 6.3.2.3. Rowkey Length). So what do I need from you guys is to know am I correct over here Also if I am correct what Algorithm I should using. I am using python over thrift as a programming language so code will be overwhelming for me...
Re: Writing map outputs to HBase
See http://hbase.apache.org/book.html#mapreduce.example.readwrite On Thu, Aug 29, 2013 at 7:26 AM, praveenesh kumar praveen...@gmail.comwrote: Hi, What is the easiest and efficient way to write a sequence file into HBase. I want to parse the sequence file. My sequence file has records in the form of null,bytes . I want to parse each value, generate keys and values in map() function and write the output into HBase. I am trying to use HBaseTableUtil class. But I am seeing TableMapReduceUtil.initTableReducerJob() method that is doing something that I need. But I guess, it requires some reducer to exist. Am I right here ? Other way is to create HBase Configuration object in mapper.setup () method, do insertions in map () function and close the connections in mapper.close () method. I was wondering what is the industry standard or most efficient way to do map-side insertion on Hbase. Any suggestions would be really helpful. Thanks Praveenesh
Never ending Doing distributed log split task.,
I have restart my cluster and I'm now waiting for this task to end: Doing distributed log split in [hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting] It's running fir now 30 minutes. There was nothing running on the cluster. No reads, no writes, nothing, for days... I got that on the logs: 2013-08-29 11:36:10,862 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting/node1%2C60020%2C1377789460683.1377789462024 interrupted, resigning java.io.InterruptedIOException at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118) ... 9 more 2013-08-29 11:36:10,950 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: Interrupted while trying to assert ownership of /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1377789460683-splitting%2Fnode1%252C60020%252C1377789460683.1377789462024 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:361) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.attemptToOwnTask(SplitLogWorker.java:346) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179) at java.lang.Thread.run(Thread.java:722) I'm not 100% what is causing that. I have restarted it and still getting the same result. Any hint? Thanks, JM
Re: experiencing high latency for few reads in HBase
Saurabh, I have a suspicion that the few high latency responses are happening because of hot region.(s) I vaguely remember you mentioning that the data is evenly distributed across all regions. I hope your test also goes across them evenly. You may want to check the read requests to the regions. Regards, - kiru From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 2:49 AM Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple clients.) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:20 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get. We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients). Thanks, Saurabh. On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Right 4 sec is good. @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ? BTW, in this stress test how many concurrent clients do you have ? Regards, - kiru From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 12:15 PM Subject: RE: experiencing high latency for few reads in HBase 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 5:10 AM To: user@hbase.apache.org Subject: experiencing high latency for few reads in
Re: Hbase thrift client's privilege control
sorry, I click the sent button early. Since Thrift gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the Thrift gateway itself. All client access via the Thrift gateway will use the Thrift gateway's credential and have its privilege.( http://hbase.apache.org/book/security.html) So are there any method or patch to control the privilege of different client of hbase thrift gateway. thanks! atupal 2013/8/29 Kangle Yu kangl...@hustunique.com Hi all,
Re: counter Increment gives DonotRetryException
The exception came from HRegion#increment(): if(kv.getValueLength() == Bytes.SIZEOF_LONG) { amount += Bytes.toLong(kv.getBuffer(), kv.getValueOffset(), Bytes.SIZEOF_LONG); } else { // throw DoNotRetryIOException instead of IllegalArgumentException throw new org.apache.hadoop.hbase.DoNotRetryIOException( Attempted to increment field that isn't 64 bits wide ); } Can you check the values in 'columnar:column1' ? On Thu, Aug 29, 2013 at 4:42 AM, yeshwanth kumar yeshwant...@gmail.comwrote: i am newbie to Hbase, going through Counters topic, whenever i perform increment like incr 't1','9row27','columnar:column1',1 it gives an ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field that isn't 64 bits wide looking for some help
RE: experiencing high latency for few reads in HBase
Yes. HBase won't guarantee strict sub-second latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 2:49 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple clients.) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:20 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get. We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients). Thanks, Saurabh. On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Right 4 sec is good. @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ? BTW, in this stress test how many concurrent clients do you have ? Regards, - kiru From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 12:15 PM Subject: RE: experiencing high latency for few reads in HBase 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 5:10 AM To: user@hbase.apache.org Subject: experiencing high latency for few reads in HBase Hi, We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds.
Re: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable
You are also using the @Override annotation to make sure that your overridden method is being called? Regards, Shahab On Thu, Aug 29, 2013 at 12:03 PM, praveenesh kumar praveen...@gmail.comwrote: Thanks Shahab for replying. Sorry, that was typo, while writing the code snippet. Even keeping the keys as NullWritable or LongWritable i.e. by keeping the same types of keys, I am getting the same error. I don't think the error is at Map Input side. Its saying value from map. Can't understand where I am going wrong. Regards Praveenesh On Thu, Aug 29, 2013 at 4:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: public class MYHBaseLoader extends Mapper*NullWritable*,BytesWritable,NullWritable,Put { protected void map (*LongWritable* key, BytesWritable value, Context context) throws IOException, InterruptedException { ... Why is the difference in types of the keys? Regards, Shahab On Thu, Aug 29, 2013 at 5:46 AM, praveenesh kumar praveen...@gmail.com wrote: Hi all, I am trying to write a MR code to load a HBase table. I have a mapper that emits (null,put object) and I am using TableMapReduceUtil.initTableReducerJob() to write it into a HBase table. Following is my code snippet public class MYHBaseLoader extends MapperNullWritable,BytesWritable,NullWritable,Put { protected void map (LongWritable key, BytesWritable value, Context context) throws IOException, InterruptedException { /- Some processing here.. Create put object and pushing it into Put object). context.write(null, put);// Pushing the put object. } public static void main (String args[]) throws IOException, ClassNotFoundException, InterruptedException{ Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MYHBaseLoader.class); job.setMapperClass(MYHBaseLoader.class); TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(Put.class); job.setInputFormatClass(SequenceFileInputFormat.class); //job.setNumReduceTasks(0); FileInputFormat.setInputPaths(job, new Path(test)); Path outputPath = new Path(test_output); FileOutputFormat.setOutputPath(job,outputPath); //outputPath.getFileSystem(conf).delete(outputPath, true); job.waitForCompletion(true); System.out.println(Done); } I am getting the following error while running. Any help/guidance: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:249) Regards Praveenesh
Re: counter Increment gives DonotRetryException
You probably put a string in there that was a number, and increment expects a 8 bytes long. For example, if you did: put 't1', '9row27', 'columnar:column1', '1' Then did an increment on that, it would fail. J-D On Thu, Aug 29, 2013 at 4:42 AM, yeshwanth kumar yeshwant...@gmail.comwrote: i am newbie to Hbase, going through Counters topic, whenever i perform increment like incr 't1','9row27','columnar:column1',1 it gives an ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field that isn't 64 bits wide looking for some help
Re: Never ending Doing distributed log split task.,
So you have HBASE-8670 in your deployment. Suggest upgrading hadoop to newer release, e.g. 1.2.1 so that the new HDFS improvements can be utilized. Cheers On Thu, Aug 29, 2013 at 9:50 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hadoop 1.0.4 with HBase 0.94.12-SNAPSHOT The file name changed since I have restarted HBase but here is what I have: hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls hdfs://node3:9000/hbase/.logs/node1,60020,1377793020654/ Found 1 items -rw-r--r-- 3 hbase supergroup 0 2013-08-29 12:17 /hbase/.logs/node1,60020,1377793020654/node1%2C60020%2C1377793020654.1377793021892 And I'm able to access it: hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -get /hbase/.logs/node1,60020,1377793020654/node1%2C60020%2C1377793020654.1377793021892 . hadoop@node3:~/hadoop-1.0.3$ Oh. I just checked the UI again, and it's done. Wow! Took almost 1h. HBCK report 0 inconsistencies detected. Status: OK So seems that I'm all fine. I don't know why it was so long. I will try to take a look at my Ganglia's metrics to see if I can figure anything... JM 2013/8/29 Ted Yu yuzhih...@gmail.com What is your HBase / Hadoop version ? Can you check namenode log looking for lines related to hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683- splitting/node1%2C60020%2C1377789460683.1377789462024 ? Thanks On Thu, Aug 29, 2013 at 9:03 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I have restart my cluster and I'm now waiting for this task to end: Doing distributed log split in [hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting] It's running fir now 30 minutes. There was nothing running on the cluster. No reads, no writes, nothing, for days... I got that on the logs: 2013-08-29 11:36:10,862 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of hdfs://node3:9000/hbase/.logs/node1,60020,1377789460683-splitting/node1%2C60020%2C1377789460683.1377789462024 interrupted, resigning java.io.InterruptedIOException at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:136) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:54) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:780) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:414) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:118) ... 9 more 2013-08-29 11:36:10,950 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: Interrupted while trying to assert ownership of /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1377789460683-splitting%2Fnode1%252C60020%252C1377789460683.1377789462024 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:361) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.attemptToOwnTask(SplitLogWorker.java:346) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179) at java.lang.Thread.run(Thread.java:722) I'm not 100% what is causing that. I have restarted it and still getting the same result. Any hint? Thanks, JM
Re: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable
Exactly I had the same though as Ashwanth too, that is why I asked whether @Override annotation is being used or not. Regards, Shahab On Thu, Aug 29, 2013 at 1:09 PM, Ashwanth Kumar ashwanthku...@googlemail.com wrote: Hey Praveenesh, I am not sure if this would help. But can you try moving your mapper to an inner class / separate class and try the code? I somehow get a feeling that default Mapper (IdentityMapper) is being used (may be you can check the mapreduce.map.class value?), that would be the only reason why your value (BytesWritable) gets emitted out in context.write(). On Thu, Aug 29, 2013 at 3:16 PM, praveenesh kumar praveen...@gmail.com wrote: Hi all, I am trying to write a MR code to load a HBase table. I have a mapper that emits (null,put object) and I am using TableMapReduceUtil.initTableReducerJob() to write it into a HBase table. Following is my code snippet public class MYHBaseLoader extends MapperNullWritable,BytesWritable,NullWritable,Put { protected void map (LongWritable key, BytesWritable value, Context context) throws IOException, InterruptedException { /- Some processing here.. Create put object and pushing it into Put object). context.write(null, put);// Pushing the put object. } public static void main (String args[]) throws IOException, ClassNotFoundException, InterruptedException{ Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MYHBaseLoader.class); job.setMapperClass(MYHBaseLoader.class); TableMapReduceUtil.initTableReducerJob(MY_IMPORT_TABLE_NAME,IdentityTableReducer.class,job); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(Put.class); job.setInputFormatClass(SequenceFileInputFormat.class); //job.setNumReduceTasks(0); FileInputFormat.setInputPaths(job, new Path(test)); Path outputPath = new Path(test_output); FileOutputFormat.setOutputPath(job,outputPath); //outputPath.getFileSystem(conf).delete(outputPath, true); job.waitForCompletion(true); System.out.println(Done); } I am getting the following error while running. Any help/guidance: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.client.Put, recieved org.apache.hadoop.io.BytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1023) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:689) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:363) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:249) Regards Praveenesh -- Ashwanth Kumar / ashwanthkumar.in
Error running hbase
Hi, I am trying to run write directly to hbase from a mapreduce code. But I am getting this issue similar to what is reported here: http://stackoverflow.com/questions/12607349/cant-connect-to-zookeeper-and-then-hbase-master-shuts-down How to solve this. I think I am running an hbase instance already setup on my cluster. So hbase shell works just fine?? Not sure what I am missing? Any suggestions. Thansk
Re: experiencing high latency for few reads in HBase
Thanks Vlad. Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region servers. Does that could be a problem? If it is, how to solve that? We already ran the major compaction after ingesting the data. Thanks, Saurabh. On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes. HBase won't guarantee strict sub-second latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 2:49 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple clients.) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:20 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get. We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients). Thanks, Saurabh. On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Right 4 sec is good. @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ? BTW, in this stress test how many concurrent clients do you have ? Regards, - kiru From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 12:15 PM Subject: RE: experiencing high latency for few reads in HBase 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com
Re: experiencing high latency for few reads in HBase
Thanks Kiru. We have 10TB of data on disk. It would not fit in memory. Also for the first time, hbase need to read from the disk. And it has to go through the network to read the blocks which are stored at other data node. So in my opinion, locality matters. Thanks, Saurabh. On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: But locality index should not matter right if you are in IN_MEMORY most and you are running the test after a few runs to make sure they are already in IN_MEMORY (ie blockCacheHit is high or blockCacheMiss is low) (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 11:11 AM Subject: RE: experiencing high latency for few reads in HBase Usually, either cluster restart or major compaction helps improving locality index. There is an issue in region assignment after table disable/enable in 0.94.x (x 11) which breaks HDFS locality. Fixed in 0.94.11 You can write your own routine to manually localize particular table using public HBase Client API. But this won't help you to stay withing 1 sec anyway. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 10:52 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Vlad. Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region servers. Does that could be a problem? If it is, how to solve that? We already ran the major compaction after ingesting the data. Thanks, Saurabh. On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes. HBase won't guarantee strict sub-second latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 2:49 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple clients.) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:20 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get.
Re: experiencing high latency for few reads in HBase
Yes, in that case, it matters. I was talking about a case where you are mostly serving from cache. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 12:09 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru. We have 10TB of data on disk. It would not fit in memory. Also for the first time, hbase need to read from the disk. And it has to go through the network to read the blocks which are stored at other data node. So in my opinion, locality matters. Thanks, Saurabh. On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: But locality index should not matter right if you are in IN_MEMORY most and you are running the test after a few runs to make sure they are already in IN_MEMORY (ie blockCacheHit is high or blockCacheMiss is low) (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 11:11 AM Subject: RE: experiencing high latency for few reads in HBase Usually, either cluster restart or major compaction helps improving locality index. There is an issue in region assignment after table disable/enable in 0.94.x (x 11) which breaks HDFS locality. Fixed in 0.94.11 You can write your own routine to manually localize particular table using public HBase Client API. But this won't help you to stay withing 1 sec anyway. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 10:52 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Vlad. Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region servers. Does that could be a problem? If it is, how to solve that? We already ran the major compaction after ingesting the data. Thanks, Saurabh. On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes. HBase won't guarantee strict sub-second latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 2:49 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key. (I do have an issue right now, it is not scaling to multiple
Default balancer status
Hi, Is there a way to have the balancer off by default? We can turn it off using balancer_switch but when we restart the cluster, it's back to on. Any way to turn it off by default? Thanks, JM
Re: Error running hbase
There was an answer at the end of stackflow URL you posted. If your problem isn't solved, please let us know some more details of your deployment: HBase version, config parameters, etc. Thanks On Thu, Aug 29, 2013 at 10:49 AM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am trying to run write directly to hbase from a mapreduce code. But I am getting this issue similar to what is reported here: http://stackoverflow.com/questions/12607349/cant-connect-to-zookeeper-and-then-hbase-master-shuts-down How to solve this. I think I am running an hbase instance already setup on my cluster. So hbase shell works just fine?? Not sure what I am missing? Any suggestions. Thansk
Re: Default balancer status
This was fixed in 0.95.2. https://issues.apache.org/jira/browse/HBASE-6260 In the meantime you can set the hbase.balancer.period to a very large number. On Thu, Aug 29, 2013 at 3:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Is there a way to have the balancer off by default? We can turn it off using balancer_switch but when we restart the cluster, it's back to on. Any way to turn it off by default? Thanks, JM
Re: Default balancer status
Thanks Bryan. That's what I was looking for. If I have time I will see if I can backport that into 0.94. For now I will go with the period option... JM 2013/8/29 Bryan Beaudreault bbeaudrea...@hubspot.com This was fixed in 0.95.2. https://issues.apache.org/jira/browse/HBASE-6260 In the meantime you can set the hbase.balancer.period to a very large number. On Thu, Aug 29, 2013 at 3:32 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Is there a way to have the balancer off by default? We can turn it off using balancer_switch but when we restart the cluster, it's back to on. Any way to turn it off by default? Thanks, JM
Re: Region server exception
This exception means some other thread was holding the lock for extended period of time. Can you tell us more about your coprocessor ? Thanks On Thu, Aug 29, 2013 at 12:55 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: This exception stack happens from within my coprocessor code on concurrent reads. Any ideas ? java.io.InterruptedIOException at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5894) at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5875) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRe gion.java:5803) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H Region.java:3852) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H Region.java:3896) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com
Re: Region server exception
Ted, When there are more than 32 concurrent clients (in a 4 nodes x 8 core cluster). I keep getting responseTooSlow for my coprocessors. Our app is built mainly using coprocessor and a few multi-get. (responseTooSlow): {processingtimems:10682,call:execCoprocessor([B@511c627c, getFoo({T_5208=0.004815409309791332, .(multiple values of T_id=double value) 20), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint =-1368823753,client:10.149.5.56:38292,starttimems:1377808493508,queuetimems:7,class:HRegionServer,responsesize:0,method:execCoprocessor} We do a orderby on the T_number and do calculations on the double. This finishes in 400 msec (total T_ values processed is around 600K) when there is only one client. But takes 8000 or 1 when the # of concurrent connections are increased to 32 or above. Regards, - kiru From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Kiru Pakkirisamy kirupakkiris...@yahoo.com Sent: Thursday, August 29, 2013 1:17 PM Subject: Re: Region server exception This exception means some other thread was holding the lock for extended period of time. Can you tell us more about your coprocessor ? Thanks On Thu, Aug 29, 2013 at 12:55 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: This exception stack happens from within my coprocessor code on concurrent reads. Any ideas ? java.io.InterruptedIOException at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5894) at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5875) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRe gion.java:5803) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H Region.java:3852) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(H Region.java:3896) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com
Re: experiencing high latency for few reads in HBase
Thanks Adrian. Based on hbase book, it is listed as experimental item. ( http://hbase.apache.org/book/upgrade0.92.html), even it had been implemented back in 2011. Is anyone running this in production? Any feedback.. Thanks, Saurabh. On Aug 29, 2013, at 4:07 PM, Adrien Mogenet adrien.moge...@gmail.com wrote: Another point that could help to stay under the `1s SLA': enable direct byte buffers for LruBlockCache. Have a look at HBASE-4027. On Thu, Aug 29, 2013 at 9:27 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Yes, in that case, it matters. I was talking about a case where you are mostly serving from cache. Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 12:09 PM Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru. We have 10TB of data on disk. It would not fit in memory. Also for the first time, hbase need to read from the disk. And it has to go through the network to read the blocks which are stored at other data node. So in my opinion, locality matters. Thanks, Saurabh. On Aug 29, 2013, at 2:33 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: But locality index should not matter right if you are in IN_MEMORY most and you are running the test after a few runs to make sure they are already in IN_MEMORY (ie blockCacheHit is high or blockCacheMiss is low) (?) Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, August 29, 2013 11:11 AM Subject: RE: experiencing high latency for few reads in HBase Usually, either cluster restart or major compaction helps improving locality index. There is an issue in region assignment after table disable/enable in 0.94.x (x 11) which breaks HDFS locality. Fixed in 0.94.11 You can write your own routine to manually localize particular table using public HBase Client API. But this won't help you to stay withing 1 sec anyway. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 10:52 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Vlad. Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region servers. Does that could be a problem? If it is, how to solve that? We already ran the major compaction after ingesting the data. Thanks, Saurabh. On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Yes. HBase won't guarantee strict sub-second latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Thursday, August 29, 2013 2:49 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, We do have strict latency requirement as it is financial data requiring direct access from clients. Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ? On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers. Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that probability of two independent events (slow requests) is the product of event's probabilities themselves. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 4:18 PM To: user@hbase.apache.org Subject: Re: experiencing high latency for few reads in HBase Thanks Kiru, Scan is not an option for our use cases. Our read is pretty random. Any other suggestion to bring down the latency. Thanks, Saurabh. On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com
Re: observation while running hbase under load
Does Hbase gives higher Preference to Writes than Reads , if one tries to do both operation for the same rowkey at the same time??? My scenario Iam new to Hbase and Iam testing Hbase for our datawarehouse solution. Iam trying following 2 scenarios. 10 Rows Each of the Rowkey has 5000 Columns Qualifiers spread across 3 Column families. I generate following 2 kinds of load. 1. 1.1 Generate 10 of rows , with sequential INSERT. By sequential INSERT I mean Each time I do a insert of rowkey also insert all of the 5000 Column qualifiers . each insert also does some READS as some of the column families act like Index. 1.2 After Generation of Table using above , I perform READ , SCAN and INSERT randomly column qualifier in random fashion. 2. Doing both Generation of Load (1.1 ) and doing read ,scan and insert random column qualifier (1.2) in parallel Observed Behavior. While 2 is happening I can see that read ,scan take more than what they use to take in 1 . This is fine as when insert is happening read is blocked as whole row is locked. But I do not see any significant difference in performance of insert or update . I thought even insert should have been blocked while read or scan is happening on the same rowkey , a lock will be held for a given ROWKEY. Please remember READ,SCAN and INSERT happen on the same Rowkeys. Question: Does hbase give preference to write than read or am I missing something ? regards, rks
RE: HBase client with security
Hi Harsh, thanks for the suggestion. I added HADOOP_PREFIX so that the conf folder is in the path. It still doesn't work, so I suppose Hadoop's core-site.xml is faulty (though I need a Kerberos ticket to use Hadoop, so security is working). In fact, when I try to list from HBase shell I get 13/08/29 23:47:43 ERROR security.UserGroupInformation: PriviledgedActionException as:lu95...@hadoop.lrz.de cause:java.io.IOException: Failed to specify server's Kerberos principal name 13/08/29 23:47:43 INFO security.UserGroupInformation: Initiating logout for lu95...@hadoop.lrz.de 13/08/29 23:47:43 INFO security.UserGroupInformation: Initiating re-login for lu95...@hadoop.lrz.de The file core-site.xml contains the following namefs.default.name/name valuehdfs://10.156.120.41:9000/value /property property namehadoop.security.authentication/name valuekerberos/value /property property namehadoop.security.authorization/name valuetrue/value /property property namehadoop.kerberos.kinit.command/name value/usr/bin/kinit/value /property What else should I need? Maybe a reference to the keytab contained in hbase/conf/zk-jaas.conf? Bye, Matteo Matteo Lanati Distributed Resources Group Leibniz-Rechenzentrum (LRZ) Boltzmannstrasse 1 85748 Garching b. München (Germany) Phone: +49 89 35831 8724 From: Harsh J [ha...@cloudera.com] Sent: 29 August 2013 15:53 To: user@hbase.apache.org Subject: Re: HBase client with security Two things come to mind: 1. Is HADOOP_CONF_DIR also on HBase's classpath? If it or HADOOP_PREFIX/HADOOP_HOME is defined, it usually is. But re-check via hbase classpath 2. Assuming (1) is good, does your core-site.xml have kerberos authentication settings for hadoop as well? On Thu, Aug 29, 2013 at 6:58 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi all, I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with security. HBase works if I launch the shell from the node running the master, but I'd like to use it from an external machine. I prepared one, copying the Hadoop and HBase installation folders and adapting the path (indeed I can use the same client to run MR jobs and interact with HDFS). Regarding HBase client configuration: - hbase-site.xml specifies property namehbase.security.authentication/name valuekerberos/value /property property namehbase.rpc.engine/name valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value /property property namehbase.zookeeper.quorum/name valuemaster.hadoop.local,host49.hadoop.local/value /property where the zookeeper hosts are reachable and can be solved via DNS. I had to specify them otherwise the shell complains about org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid - I have a keytab for the principal I want to use (user running hbase/my client hostname@MYREALM), correctly addressed by the file hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to zk-jaas.conf. Nonetheless, when I issue a command from a HBase shell on the client machine, I got an error in the HBase master log 2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 6: readAndProcess threw exception org.apache.hadoop.security.AccessControlException: Authentication is required. Count of bytes read: 0 org.apache.hadoop.security.AccessControlException: Authentication is required at org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) It looks like there's a mismatch between the client and the master regarding the authentication mechanism. Note that from the same client machine I can launch and use a Zookeeper shell. What am I missing in the client configuration? Does /etc/krb5.conf play any role into this? Thanks, Matteo Matteo Lanati Distributed Resources Group Leibniz-Rechenzentrum (LRZ) Boltzmannstrasse 1 85748 Garching b. München (Germany) Phone: +49 89 35831 8724 -- Harsh J
Re: observation while running hbase under load
This JIRA is related: HBASE-8836 On Thu, Aug 29, 2013 at 2:22 PM, RK S dhurandarg...@gmail.com wrote: Does Hbase gives higher Preference to Writes than Reads , if one tries to do both operation for the same rowkey at the same time??? My scenario Iam new to Hbase and Iam testing Hbase for our datawarehouse solution. Iam trying following 2 scenarios. 10 Rows Each of the Rowkey has 5000 Columns Qualifiers spread across 3 Column families. I generate following 2 kinds of load. 1. 1.1 Generate 10 of rows , with sequential INSERT. By sequential INSERT I mean Each time I do a insert of rowkey also insert all of the 5000 Column qualifiers . each insert also does some READS as some of the column families act like Index. 1.2 After Generation of Table using above , I perform READ , SCAN and INSERT randomly column qualifier in random fashion. 2. Doing both Generation of Load (1.1 ) and doing read ,scan and insert random column qualifier (1.2) in parallel Observed Behavior. While 2 is happening I can see that read ,scan take more than what they use to take in 1 . This is fine as when insert is happening read is blocked as whole row is locked. But I do not see any significant difference in performance of insert or update . I thought even insert should have been blocked while read or scan is happening on the same rowkey , a lock will be held for a given ROWKEY. Please remember READ,SCAN and INSERT happen on the same Rowkeys. Question: Does hbase give preference to write than read or am I missing something ? regards, rks
Re: experiencing high latency for few reads in HBase
I just moved from 0.94.10 to 0.94.11. Tremendous improvement in our app's query response. Went down to 1.3 sec from 1.7 sec. Concurrent tests are also good, but it still exponentially degrades from to 10 secs for 8 concurrent clients. There might a bug lurking in there somewhere that is probably affecting us. Regards, - kiru From: Federico Gaule fga...@despegar.com To: user@hbase.apache.org Sent: Thursday, August 29, 2013 5:37 AM Subject: Re: experiencing high latency for few reads in HBase In 0.94.11 Release, has been included an optimization for MultiGets: https://issues.apache.org/jira/browse/HBASE-9087 What version have you deployed? On 08/29/2013 01:29 AM, lars hofhansl wrote: A 1s SLA is tough in HBase (or any large memory JVM application). Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile. I heard some folks have experimented with 30GB heaps and G1 and have reported max GC times of 200ms, but I have not verified that. -- Lars - Original Message - From: Saurabh Yahoo saurabh...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 28, 2013 3:17 PM Subject: Re: experiencing high latency for few reads in HBase Hi Vlad, Thanks for your response. 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec. We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size? 2. Cache hit ratio is 95%. One thing I don't understand that we have allocated only 4gb for block cache out of 12gb. That left 8gb for rest of JVM. There is no write. Memcache is empty. Is 8gb not enough for hbase to process the requests? What are the most memory consuming objects in region server? 3. We will change the cf to IN_memory and report back performance difference. Thanks, Saurabh. On Aug 28, 2013, at 3:15 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: 1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3. Block cache consists of 3 parts (25% young generation, 50% - tenured, 25% - permanent). Permanent part is for CF with IN_MEMORY = true (you can specify this when you create CF). Block first stored in 'young gen' space, then gets promoted to 'tenured gen' space (or gets evicted). May be your 'perm gen' space is underutilized? This is exact 25% of 4GB (1GB). Although HBase LruBlockCache should use all the space allocated for block cache - there is no guarantee (as usual). If you don have in_memory column families you may decrease Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Saurabh Yahoo [saurabh...@yahoo.com] Sent: Wednesday, August 28, 2013 5:10 AM To: user@hbase.apache.org Subject: experiencing high latency for few reads in HBase Hi, We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every read? We observe the following things - 1. Reads are evenly distributed among 5 nodes. CPUs remain under 5% utilized. 2. We have 4gb block cache (30% block cache out of 12gb) setup. 3gb block cache got filled up but around 1gb remained free. There are a large number of cache eviction. Questions to experts - 1. If there are still 1gb of free block cache available, why is hbase evicting the block from cache? 4. We are seeing memory went up to 10gb three times before dropping sharply to 5gb. Any help is highly appreciable, Thanks, Saurabh. Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.