Developers sought for web crawling project
Hi all, Disclosure: I have been an active member of the Hadoop / HBase / Hive mailing lists for some time. I am not a recruiter, but looking to increase a development team that I lead. I sincerely apologize if this message is against mailing list etiquette; I have not seen any guidelines forbidding this. We are looking to expand the development team based in Copenhagen, Denmark. We run and operate an indexing system that provides access to biodiversity information, and specifically 100s millions of point based observations of species occurrence documented in the past century. Our live system is outgrowing a MySQL database, and we are already tentatively using Hive, Hadoop and have run some tests using HBase. Over the next 16 months we will rework the whole system, including custom reporting, custom maps and other visualizations, real time search indexes, reducing latency, quality control integration, custom vocabulary work etc. Our final architecture will likely include Hive, Hadoop, HBase, Lucene (SOLR / ElasticSearch?), Sqoop and PostGIS. If you are familiar with these technologies, and the thought of some time working in Europe appeals to you, then we'd love to hear from you. Guidelines for applications are on the advert: http://tinyurl.com/gbif-dev-job Thanks, Tim
Re: Developers sought for web crawling project
Tim, You can try j...@apache.org, its meant for this purpose. Sorry for the noise. I belong to Apache MINA team and just getting started here :) thanks ashish On Tue, Jul 20, 2010 at 8:22 PM, Tim Robertson timrobertson...@gmail.com wrote: Hi all, Disclosure: I have been an active member of the Hadoop / HBase / Hive mailing lists for some time. I am not a recruiter, but looking to increase a development team that I lead. I sincerely apologize if this message is against mailing list etiquette; I have not seen any guidelines forbidding this. We are looking to expand the development team based in Copenhagen, Denmark. We run and operate an indexing system that provides access to biodiversity information, and specifically 100s millions of point based observations of species occurrence documented in the past century. Our live system is outgrowing a MySQL database, and we are already tentatively using Hive, Hadoop and have run some tests using HBase. Over the next 16 months we will rework the whole system, including custom reporting, custom maps and other visualizations, real time search indexes, reducing latency, quality control integration, custom vocabulary work etc. Our final architecture will likely include Hive, Hadoop, HBase, Lucene (SOLR / ElasticSearch?), Sqoop and PostGIS. If you are familiar with these technologies, and the thought of some time working in Europe appeals to you, then we'd love to hear from you. Guidelines for applications are on the advert: http://tinyurl.com/gbif-dev-job Thanks, Tim -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
Best way to write data
Hi everyone! I'm new to HBase and I had one question about what's the best write strategy. I have a process that updates a series of rows several times, and as HBase is modelled after BigTable, I suppose the best course of action is to group modifications to a row in a single batch, to avoid row fragmentation (where hbase has to read several row versions to provide a response. Is this correct or I can update rows as the changes come without worries? Thanks in advance for your help!
Re: High ingest rate and FIN_WAIT1 problems
Yes, hadoop 0.20.2 and hbase 0.20.5. I will get the branch you suggest, and give it a whirl. I am leaving on vacation Thursday, so I may not have any results to report till I get back. When I do get back, I will catch up with versions/fixes and try some more. Meanwhile, thanks to all who have responded to my posts. thomas downing On 7/20/2010 1:06 PM, Stack wrote: Hey Thomas: You are using hadoop 0..20.2 or something? And hbase 0.20.5 or so? You might try http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/. In particlular, it has HDFS-1118 Fix socketleak on DFSClient. St.Ack On Tue, Jul 20, 2010 at 1:58 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Yes, I did try the timeout of 0. As expected, I did not see sockets in FIN_WAIT2 or TIME_WAIT for very long. I still leak sockets at the ingest rates I need - the FIN_WAIT1 problem. Also, with the more careful observations this time around, I noted that even before the FIN_WAIT1 problem starts to crop up (at around 1600M inserts) there is already a slower socket leakage with timeout=0 and no FIN_WAIT1 problem. At 100M sockets were hovering around 50-60, by 800M they were around 200, and at 1600M they were at 400. This is slower than without the timeout set to 0 (about half the rate), but it is still ultimately fatal. This socket increase is all between hbase and hadoop, none between test client and hbase. While the FIN_WAIT1 problem is triggered by an hbase side issue, I have no indication of which side causes this other leak. thanks thomas downing On 7/19/2010 4:31 PM, Ryan Rawson wrote: Did you try the setting I suggested? There is/was a known bug in HDFS which can cause issues which may include abandoned sockets such as you are describing. -ryan On Mon, Jul 19, 2010 at 2:13 AM, Thomas Downing tdown...@proteus-technologies.comwrote: Thanks for the response, but my problem is not with FIN_WAIT2, it is with FIN_WAIT1. If it was FIN_WAIT2, the only concern would be socket leakage, and if setting the time out solved the issue, that would be great. The problem with FIN_WAIT1 is twofold - first, it is incumbent on the application to notice and handle this problem; from the TCP stack point of view, there is nothing wrong. It is just a special case of slow consumer. The other problem is that it implies that something will be lost if the socket is abandoned, there is data in the send queue of the socket in FIN_WAIT1 that has not yet been delivered to the peer. On 7/16/2010 3:56 PM, Ryan Rawson wrote: I've been running with this setting on both the HDFS side and the HBase side for over a year now, it's a bit of voodoo but you might be running into well known suckage of HDFS. Try this one and restart your hbase hdfs. The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so much for inserts. property namedfs.datanode.socket.write.timeout/name value0/value /property -ryan On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Thanks for the response. My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2, my problem is with FIN_WAIT1. While I do see some sockets in TIME_WAIT, they are only a few, and the number is not growing. On 7/16/2010 12:07 PM, Hegner, Travis wrote: Hi Thomas, I ran into a very similar issue when running slony-I on postgresql to replicate 15-20 databases. Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or hopefully stop), the leaking sockets. I found some notes about adjusting TCP parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html [snip] -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6A53327EB7.A78FD -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=2E38F27E96.A72CF
Re: High ingest rate and FIN_WAIT1 problems
On Tue, Jul 20, 2010 at 10:15 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Meanwhile, thanks to all who have responded to my posts. Thanks for persisting with this Thomas. You might also take a look at cloudera CDH3b2. It'll have the above fixes and then some. I've not looked too closely at what the 'then some' consists of recently -- and mighty Todd, our CDH-er is holidaying himself these times else he'd tell you himself -- but it might be worth checking it out. Yours, St.Ack thomas downing On 7/20/2010 1:06 PM, Stack wrote: Hey Thomas: You are using hadoop 0..20.2 or something? And hbase 0.20.5 or so? You might try http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/. In particlular, it has HDFS-1118 Fix socketleak on DFSClient. St.Ack On Tue, Jul 20, 2010 at 1:58 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Yes, I did try the timeout of 0. As expected, I did not see sockets in FIN_WAIT2 or TIME_WAIT for very long. I still leak sockets at the ingest rates I need - the FIN_WAIT1 problem. Also, with the more careful observations this time around, I noted that even before the FIN_WAIT1 problem starts to crop up (at around 1600M inserts) there is already a slower socket leakage with timeout=0 and no FIN_WAIT1 problem. At 100M sockets were hovering around 50-60, by 800M they were around 200, and at 1600M they were at 400. This is slower than without the timeout set to 0 (about half the rate), but it is still ultimately fatal. This socket increase is all between hbase and hadoop, none between test client and hbase. While the FIN_WAIT1 problem is triggered by an hbase side issue, I have no indication of which side causes this other leak. thanks thomas downing On 7/19/2010 4:31 PM, Ryan Rawson wrote: Did you try the setting I suggested? There is/was a known bug in HDFS which can cause issues which may include abandoned sockets such as you are describing. -ryan On Mon, Jul 19, 2010 at 2:13 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Thanks for the response, but my problem is not with FIN_WAIT2, it is with FIN_WAIT1. If it was FIN_WAIT2, the only concern would be socket leakage, and if setting the time out solved the issue, that would be great. The problem with FIN_WAIT1 is twofold - first, it is incumbent on the application to notice and handle this problem; from the TCP stack point of view, there is nothing wrong. It is just a special case of slow consumer. The other problem is that it implies that something will be lost if the socket is abandoned, there is data in the send queue of the socket in FIN_WAIT1 that has not yet been delivered to the peer. On 7/16/2010 3:56 PM, Ryan Rawson wrote: I've been running with this setting on both the HDFS side and the HBase side for over a year now, it's a bit of voodoo but you might be running into well known suckage of HDFS. Try this one and restart your hbase hdfs. The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so much for inserts. property namedfs.datanode.socket.write.timeout/name value0/value /property -ryan On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Thanks for the response. My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2, my problem is with FIN_WAIT1. While I do see some sockets in TIME_WAIT, they are only a few, and the number is not growing. On 7/16/2010 12:07 PM, Hegner, Travis wrote: Hi Thomas, I ran into a very similar issue when running slony-I on postgresql to replicate 15-20 databases. Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or hopefully stop), the leaking sockets. I found some notes about adjusting TCP parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html [snip] -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6A53327EB7.A78FD -- Follow this link to mark it as spam: http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=2E38F27E96.A72CF
Flaky tableExists()
Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:762) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) hbase(main):011:0 list SUBSCRIPTIONS 1 row(s) in 0.0070 seconds hbase(main):012:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list TOPICS Test TestTable USERS 4 row(s) in 0.0930 seconds hbase(main):005:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list EVENTS SUBSCRIPTIONS TOPICS TestTable USERS 5 row(s) in 0.1000 seconds
Re: Flaky tableExists()
This sucks, and there are being substantial reworks to master functions in 0.90 :-) Hopefully that will permanently address these kinds of bugs. Hopefully within a few weeks there'll be a developer preview (0.89 series) that has that code. -ryan On Tue, Jul 20, 2010 at 2:52 PM, Karthik Kambatla kkamb...@cs.purdue.edu wrote: Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:762) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) hbase(main):011:0 list SUBSCRIPTIONS 1 row(s) in 0.0070 seconds hbase(main):012:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list TOPICS Test TestTable USERS 4 row(s) in 0.0930 seconds hbase(main):005:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list EVENTS SUBSCRIPTIONS TOPICS TestTable USERS 5 row(s) in 0.1000 seconds
Re: Flaky tableExists()
Thanks Ryan for understanding the pain :) Is there a work-around for the time being? I haven't experienced any such issues with earlier versions of HBase. Was I just lucky or would it make sense to revert to earlier questions? Thanks Karthik On Tue, Jul 20, 2010 at 2:54 PM, Ryan Rawson ryano...@gmail.com wrote: This sucks, and there are being substantial reworks to master functions in 0.90 :-) Hopefully that will permanently address these kinds of bugs. Hopefully within a few weeks there'll be a developer preview (0.89 series) that has that code. -ryan On Tue, Jul 20, 2010 at 2:52 PM, Karthik Kambatla kkamb...@cs.purdue.edu wrote: Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:762) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) hbase(main):011:0 list SUBSCRIPTIONS 1 row(s) in 0.0070 seconds hbase(main):012:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list TOPICS Test TestTable USERS 4 row(s) in 0.0930 seconds hbase(main):005:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list EVENTS SUBSCRIPTIONS TOPICS TestTable USERS 5 row(s) in 0.1000 seconds
Re: Flaky tableExists()
Looks like your .META. is confused about things, or your master is editing it in a weird way. AFAIK, it's not a known issue with 0.20.5. I would advise first scanning the .META. table and look if your rows are changing between shell invocations (just look at the first row of each table). If it does change, look at what the master is doing by tailing its log file, it could be hung disabling tables or something weirder. If you can't figure it, feel free to pastebin the outputs so we can look at them. J-D On Tue, Jul 20, 2010 at 3:26 PM, Karthik Kambatla karthik.shashank.kamba...@gmail.com wrote: Thanks Ryan for understanding the pain :) Is there a work-around for the time being? I haven't experienced any such issues with earlier versions of HBase. Was I just lucky or would it make sense to revert to earlier questions? Thanks Karthik On Tue, Jul 20, 2010 at 2:54 PM, Ryan Rawson ryano...@gmail.com wrote: This sucks, and there are being substantial reworks to master functions in 0.90 :-) Hopefully that will permanently address these kinds of bugs. Hopefully within a few weeks there'll be a developer preview (0.89 series) that has that code. -ryan On Tue, Jul 20, 2010 at 2:52 PM, Karthik Kambatla kkamb...@cs.purdue.edu wrote: Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:762) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) hbase(main):011:0 list SUBSCRIPTIONS 1 row(s) in 0.0070 seconds hbase(main):012:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list TOPICS Test TestTable USERS 4 row(s) in 0.0930 seconds hbase(main):005:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list EVENTS SUBSCRIPTIONS TOPICS TestTable USERS 5 row(s) in 0.1000 seconds
RE: Flaky tableExists()
Also, I'm not sure this will be fixed with the master changes. tableExists() actually makes a call to listTables() and then checks if the table you are looking for is in the list. listTables() uses a MetaScanner/MetaVisitor. Though the master changes de-emphasize the general use of meta scanning, it would not necessarily change the implementation of these client (to RS) calls. Hard to say what would make this flakey besides some of the older bugs around lots of META StoreFiles. What version of HBase are you running? JG -Original Message- From: Jonathan Gray [mailto:jg...@facebook.com] Sent: Tuesday, July 20, 2010 4:06 PM To: user@hbase.apache.org Subject: RE: Flaky tableExists() I've personally never seen a flaky tableExists() but it's not something I've used heavily. Could you do the same operations you did below in the shell but also run: scan '.META.' -Original Message- From: Karthik Kambatla [mailto:karthik.shashank.kamba...@gmail.com] Sent: Tuesday, July 20, 2010 3:26 PM To: user@hbase.apache.org Subject: Re: Flaky tableExists() Thanks Ryan for understanding the pain :) Is there a work-around for the time being? I haven't experienced any such issues with earlier versions of HBase. Was I just lucky or would it make sense to revert to earlier questions? Thanks Karthik On Tue, Jul 20, 2010 at 2:54 PM, Ryan Rawson ryano...@gmail.com wrote: This sucks, and there are being substantial reworks to master functions in 0.90 :-) Hopefully that will permanently address these kinds of bugs. Hopefully within a few weeks there'll be a developer preview (0.89 series) that has that code. -ryan On Tue, Jul 20, 2010 at 2:52 PM, Karthik Kambatla kkamb...@cs.purdue.edu wrote: Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:762) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso rImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91 5) hbase(main):011:0 list SUBSCRIPTIONS 1 row(s) in 0.0070 seconds hbase(main):012:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list TOPICS Test TestTable USERS 4 row(s) in 0.0930 seconds hbase(main):005:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list EVENTS SUBSCRIPTIONS TOPICS TestTable USERS 5 row(s) in 0.1000 seconds
Re: HBase 0.89 and JDK version
u18 frequently sigsegvs on users (and then they wonder why region servers are missing), and this is also true for Hadoop. u20 seems stable but a lot of people still prefer u16. J-D On Tue, Jul 20, 2010 at 4:23 PM, Syed Wasti mdwa...@hotmail.com wrote: Hi, We recently upgraded our QA cluster to Cloudera Version 3 (CDH3) which has Hbase 0.89. Our cluster is running on JDK 1.6.0_18 version. On trying to start up Hbase it basically gives an error “you're running jdk 1.6.0_18 which has known bugs” even though Pig and Hive seems to work fine with the version of JDK. Any thoughts on why I am seeing this error ? If there is a bug in this JDK version then what is recommended, upgrading JDK to 19 or 20 or 21 (21 release this month) or downgrade the jdk version ? Thanks for the support. Regards -SW #java -version java version 1.6.0_18 Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
RE: HBase 0.89 and JDK version
You are seeing this error because JDK 1.6u18 has known issues, as the message describes :) Pig and Hive apparently do not do this check. You should upgrade or downgrade your JVM. -Original Message- From: Syed Wasti [mailto:mdwa...@hotmail.com] Sent: Tuesday, July 20, 2010 4:23 PM To: user@hbase.apache.org; hbase-u...@hadoop.apache.org Subject: HBase 0.89 and JDK version Hi, We recently upgraded our QA cluster to Cloudera Version 3 (CDH3) which has Hbase 0.89. Our cluster is running on JDK 1.6.0_18 version. On trying to start up Hbase it basically gives an error you're running jdk 1.6.0_18 which has known bugs even though Pig and Hive seems to work fine with the version of JDK. Any thoughts on why I am seeing this error ? If there is a bug in this JDK version then what is recommended, upgrading JDK to 19 or 20 or 21 (21 release this month) or downgrade the jdk version ? Thanks for the support. Regards -SW #java -version java version 1.6.0_18 Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
Re: Flaky tableExists()
Thanks a lot for the prompt help. I am running HBase 0.20.5. I have cleaned the whole cluster and re-setup everything (HDFS, MapReduce, and HBase). Here is the transcript -- from when the HBase cluster has been started. [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list 0 row(s) in 0.0910 seconds hbase(main):002:0 create 'Test' 0 row(s) in 1.0990 seconds hbase(main):003:0 list Test 1 row(s) in 0.0070 seconds hbase(main):004:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list 0 row(s) in 0.0860 seconds hbase(main):004:0 scan '.META.' ROW COLUMN+CELL Test,,1279665697803 column=info:server, timestamp=1279668781099, value=15.25.119.96:60020 Test,,1279665697803 column=info:serverstartcode, timestamp=1279668781099, value=1279665651801 1 row(s) in 0.0260 seconds hbase(main):005:0 create 'Test1' 0 row(s) in 1.1050 seconds hbase(main):006:0 list 0 row(s) in 0.0060 seconds hbase(main):007:0 scan '.META.' ROW COLUMN+CELL Test,,1279665697803 column=info:server, timestamp=1279668781099, value=15.25.119.96:60020 Test,,1279665697803 column=info:serverstartcode, timestamp=1279668781099, value=1279665651801 1 row(s) in 0.0100 seconds hbase(main):008:0 exit [kamba...@mercado-2 ~]$ hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Version: 0.20.5, rUnknown, Thu Jul 15 22:27:05 PDT 2010 hbase(main):001:0 list 0 row(s) in 0.0900 seconds hbase(main):002:0 scan '.META.' ROW COLUMN+CELL 0 row(s) in 0.0130 seconds hbase(main):003:0 scan '.META.' ROW COLUMN+CELL 0 row(s) in 0.0070 seconds Thanks Karthik On Tue, Jul 20, 2010 at 4:15 PM, Jonathan Gray jg...@facebook.com wrote: Also, I'm not sure this will be fixed with the master changes. tableExists() actually makes a call to listTables() and then checks if the table you are looking for is in the list. listTables() uses a MetaScanner/MetaVisitor. Though the master changes de-emphasize the general use of meta scanning, it would not necessarily change the implementation of these client (to RS) calls. Hard to say what would make this flakey besides some of the older bugs around lots of META StoreFiles. What version of HBase are you running? JG -Original Message- From: Jonathan Gray [mailto:jg...@facebook.com] Sent: Tuesday, July 20, 2010 4:06 PM To: user@hbase.apache.org Subject: RE: Flaky tableExists() I've personally never seen a flaky tableExists() but it's not something I've used heavily. Could you do the same operations you did below in the shell but also run: scan '.META.' -Original Message- From: Karthik Kambatla [mailto:karthik.shashank.kamba...@gmail.com] Sent: Tuesday, July 20, 2010 3:26 PM To: user@hbase.apache.org Subject: Re: Flaky tableExists() Thanks Ryan for understanding the pain :) Is there a work-around for the time being? I haven't experienced any such issues with earlier versions of HBase. Was I just lucky or would it make sense to revert to earlier questions? Thanks Karthik On Tue, Jul 20, 2010 at 2:54 PM, Ryan Rawson ryano...@gmail.com wrote: This sucks, and there are being substantial reworks to master functions in 0.90 :-) Hopefully that will permanently address these kinds of bugs. Hopefully within a few weeks there'll be a developer preview (0.89 series) that has that code. -ryan On Tue, Jul 20, 2010 at 2:52 PM, Karthik Kambatla kkamb...@cs.purdue.edu wrote: Hi all I have been noticing a slightly flaky behavior with respect to HBaseAdmin.tableExists(). After I have created the table, it returns true/false every time when called. In hbase shell, the list command prints out all the tables sometimes, and shows no tables on other occasions. Any ideas as to why this happens? The master on my cluster runs the HDFS - namenode, secondary namenode; HBase - HMaster, HQuorumPeer. The 12 slaves run the HDFS datanode and HBase HRegionServer. Below is the transcript of operations on hbase shell... Every time I exit and enter hbase shell, the output changes. Thanks Karthik hbase(main):002:0 count 'SUBSCRIPTIONS' 0 row(s) in 6.0560 seconds hbase(main):003:0 create 'Test' 0 row(s) in 1.0840 seconds hbase(main):004:0 list SUBSCRIPTIONS 1 row(s) in 0.0090 seconds hbase(main):010:0 create 'Test' NativeException: org.apache.hadoop.hbase.TableExistsException: org.apache.hadoop.hbase.TableExistsException: Test at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:798)
Re: HBase 0.89 and JDK version
On Tue, Jul 20, 2010 at 7:27 PM, Jonathan Gray jg...@facebook.com wrote: You are seeing this error because JDK 1.6u18 has known issues, as the message describes :) Pig and Hive apparently do not do this check. You should upgrade or downgrade your JVM. -Original Message- From: Syed Wasti [mailto:mdwa...@hotmail.com] Sent: Tuesday, July 20, 2010 4:23 PM To: user@hbase.apache.org; hbase-u...@hadoop.apache.org Subject: HBase 0.89 and JDK version Hi, We recently upgraded our QA cluster to Cloudera Version 3 (CDH3) which has Hbase 0.89. Our cluster is running on JDK 1.6.0_18 version. On trying to start up Hbase it basically gives an error you're running jdk 1.6.0_18 which has known bugs even though Pig and Hive seems to work fine with the version of JDK. Any thoughts on why I am seeing this error ? If there is a bug in this JDK version then what is recommended, upgrading JDK to 19 or 20 or 21 (21 release this month) or downgrade the jdk version ? Thanks for the support. Regards -SW #java -version java version 1.6.0_18 Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) I get lots of sigsegs on u21. Had better luck with u21.
Re: HBase 0.89 and JDK version
On Tue, Jul 20, 2010 at 10:31 PM, Jonathan Gray jg...@facebook.com wrote: I get lots of sigsegs on u21. Had better luck with u21. Ed, which is the typo? :) Doh sorry, (this was cassandra) I was running. -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+UseCompressedOops \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ with 1.6.0_u20. I had some data that was causing sigseg as only a a few nodes were constantly suffering. I never found what the data was but after moving to u21 it never happened again.
RE: HBase 0.89 and JDK version
You think it enough to also include u20 with u18 as not recommended? I haven't heard anything else about u20 but maybe there's more issues out there. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, July 20, 2010 7:49 PM To: user@hbase.apache.org Subject: Re: HBase 0.89 and JDK version On Tue, Jul 20, 2010 at 10:31 PM, Jonathan Gray jg...@facebook.com wrote: I get lots of sigsegs on u21. Had better luck with u21. Ed, which is the typo? :) Doh sorry, (this was cassandra) I was running. -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+UseCompressedOops \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ with 1.6.0_u20. I had some data that was causing sigseg as only a a few nodes were constantly suffering. I never found what the data was but after moving to u21 it never happened again.
Re: HBase 0.89 and JDK version
On Tue, Jul 20, 2010 at 10:54 PM, Jonathan Gray jg...@facebook.com wrote: You think it enough to also include u20 with u18 as not recommended? I haven't heard anything else about u20 but maybe there's more issues out there. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, July 20, 2010 7:49 PM To: user@hbase.apache.org Subject: Re: HBase 0.89 and JDK version On Tue, Jul 20, 2010 at 10:31 PM, Jonathan Gray jg...@facebook.com wrote: I get lots of sigsegs on u21. Had better luck with u21. Ed, which is the typo? :) Doh sorry, (this was cassandra) I was running. -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+UseCompressedOops \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ with 1.6.0_u20. I had some data that was causing sigseg as only a a few nodes were constantly suffering. I never found what the data was but after moving to u21 it never happened again. It happened to me, but that does not mean it is a widespread issue. I feel that NoSQL pushes JVMs hard. These type of JVM failures do not end up in your log4j logs typically. I had completely overlooked JVM and was trying to troubleshoot at a much higher level. Switching JVMs can be as simple as moving some symlinks. While I do not believe upgrading/downgrading JVM should be your first recourse for troubleshooting you should consider it at some point. For reference, sigsegv appears multiple times in the 21 bug fix info. http://java.sun.com/javase/6/webnotes/BugFixes6u21.html