Re: HBase manager GUI
Hello Alok, You are always welcome :). Everybody starts new at some point. Go ahead. Lot of good people are here help you out. Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 10:16 AM, Alok Singh Mahor alokma...@gmail.comwrote: thanks a lot Mahammad for this very complete and so mature reply :) I am very new and just started playing with HBase for my college project work. I will try to play with API's thanks :) On Tue, Nov 27, 2012 at 2:31 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Alok, I have seen this project. Good work. But let me tell you one thing, the way Hbase is used is slightly different from the way you use traditional relational databases. Rarely people, who are working on real clusters, face a situation wherein they need to query Hbase directly. Though it can be done just for a few minor tasks like small gets, scans, puts etc etc. For that the Hbase shell is more than sufficient. People either use Hbase API features like filters or co-processors or write MapReduce jobs to query their Hbase tables or map their tables to Hive warehouse tables. Having said that, I would suggest you to get yourself familiar with Hbase API rather than relying on any other thing if you are planning to adopt Hbase as your primary datastore. The web interface provided by Hbase is just for visualization and monitoring and not for performing various table operations. But, that doesn't mean it is completely useless. Hbase guys have done really a great work. You can even perform some operation from the webUI as well. HTH Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:55 AM, Alok Singh Mahor alokma...@gmail.com wrote: I need frontend for HBase shell like we have phpmyadmin for MySql. I tried 127.0.0.1:600010 and 127.0.0.1:60030 these are just giving information about master mode and regional server respectively. so I tried to use hbasemanagergui but i am unable to connect it does HBase web UI have feature of using it as hbase shell GUI alternative ? if yes how to run that? On Tue, Nov 27, 2012 at 12:16 AM, Harsh J ha...@cloudera.com wrote: What are your exact 'manager GUI' needs though? I mean, what are you envisioning it will help you perform (over the functionality already offered by the HBase Web UI)? On Mon, Nov 26, 2012 at 9:59 PM, Alok Singh Mahor alokma...@gmail.com wrote: Hi all, I have set up standalone Hbase on my laptop. HBase shell is working fine. and I am not using hadoop and zookeeper I found one frontend for HBase https://sourceforge.net/projects/hbasemanagergui/ but i am not able to use this to set up connection i have to give information hbase.zookeeper.quorum: hbase.zookeeper.property.clientport: hbase.master: I values I have to set in these fields and I am not using zookeeper. did anyone try this GUI? thanks in advance :) -- Alok Singh Mahor http://alokmahor.co.cc Join the next generation of computing, Open Source and Linux/GNU!! -- Harsh J -- Alok Singh Mahor http://alokmahor.co.cc Join the next generation of computing, Open Source and Linux/GNU!! -- Alok Singh Mahor http://alokmahor.co.cc Join the next generation of computing, Open Source and Linux/GNU!!
Re: Unable to Create Table in Hbase
Hi Shyam, Are you sure your table is created? If you do a list on the shell, you can see it? Can you see it on the html gui? JM 2012/11/27, shyam kumar lakshyam.sh...@gmail.com: There is no exception or warnings in the log and the console prints the following 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:host.name=localhost 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_09 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.home=/home/shyam/jdk1.7.0_09/jre 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.class.path=lib/setooz-ir-core.jar:lib/guava-12.0.jar:lib/carrot2-core-3.7.0-SNAPSHOT.jar:lib/commons-codec-1.4.jar:lib/commons-configuration-1.7.jar:lib/hadoop-core-1.0.2.jar:lib/tika-app-1.0.jar:lib/httpclient-4.0.3.jar:lib/ezmorph.jar:lib/geoip.jar:lib/xercesImpl.jar:lib/attributes-binder-1.0.1.jar:lib/jackson-core-asl-1.7.4.jar:lib/veooz-analysis.jar:lib/log4j-1.2.17.jar:lib/maxent-3.0.0.jar:lib/liblinear-1.7.jar:lib/semantifire-1.0.jar:lib/ritaWN.jar:lib/slf4j-log4j12-1.6.1.jar:lib/commons-logging-1.1.1.jar:lib/slf4j-api-1.6.1.jar:lib/bzip2.jar:lib/langdetect.jar:lib/mahout-math-0.6.jar:lib/zookeeper-3.4.3.jar:lib/commons-lang-2.5.jar:lib/wikixmlj-r43.jar:lib/commons-collections-3.1.jar:lib/hppc-0.4.1.jar:lib/mahout-collections-1.0.jar:lib/jackson-mapper-asl-1.7.4.jar:lib/supportWN.jar:lib/simple-xml-2.6.4.jar:lib/commons-beanutils-1.7.jar:lib/opennlp-tools-1.5.0.jar:lib/setooz-core-3.5-SNAPSHOT.jar:lib/json-lib-2.4-jdk15.jar:lib/gson-2.2.2.jar:lib/jsoup-1.6.0.jar:lib/jsonic-1.2.4.jar:lib/lucene-analyzers-3.6.0.jar:lib/hbase-0.92.1.jar:lib/xml-apis.jar:conf/:dist/Veooz-Core.jar:. 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:java.compiler=NA 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-33-generic-pae 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:user.name=shyam 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/shyam 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/shyam/workspace/Veooz/Veooz-Core 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=18 watcher=hconnection 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181 12/11/27 11:03:42 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration. 12/11/27 11:03:42 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6296@setu-M68MT-S2 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13b405aac590004, negotiated timeout = 4 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=18 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16f77b6 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181 12/11/27 11:03:42 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration. 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 12/11/27 11:03:42 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6296@setu-M68MT-S2 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13b405aac590005, negotiated timeout = 4 12/11/27 11:03:42 INFO zookeeper.ClientCnxn: EventThread shut down 12/11/27 11:03:42 INFO zookeeper.ZooKeeper: Session: 0x13b405aac590005 closed Creating HBase Table: Posts and finally the process is not terminating ... it is
Re: Connecting to standalone HBase from a remote client
Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which server should be contacted for what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me explain - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific, I don't directly see how they should solve the problem). On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] ml-node+s679495n4034419...@n3.nabble.com wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email]http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region servers or hdfs as they would be all on the same localhost... Cheers, Nicolas On Mon, Nov 26, 2012 at 7:16 PM, Mohammad Tariq [hidden email]http://user/SendEmail.jtp?type=nodenode=4034419i=1 wrote: what -- If you reply to this email, your message will be added to the discussion below: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-from-a-remote-client-tp4034362p4034419.html To unsubscribe from Connecting to standalone HBase from a remote client, click herehttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034362code=bWF0YW5AY2xvdWRhbG9lLm9yZ3w0MDM0MzYyfC0xMDg3NTk1Njc3 . NAMLhttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-from-a-remote-client-tp4034362p4034438.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Connecting to standalone HBase from a remote client
Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which-server-should-be-contacted-for-what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me delineate - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered, as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific according to my interpretation of documentation, I don't directly see how they should solve the problem). Perhaps Mohammad you can explain why those zookeeper properties relate to how the master references itself towards zookeeper? Should I take it from St.Ack that there is currently no way to specify the master's remotely accessible server/ip in the HBase configuration? Anyway, my HBase server's /etc/hosts has just one line now, in case it got lost on the thread - 127.0.0.1 localhost 'server-name'. Everything works fine on the HBase server itself, the same client code runs perfectly there. Thanks again, Matan On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] ml-node+s679495n4034419...@n3.nabble.com wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email]http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region servers or hdfs as they would be all on the same localhost... Cheers, Nicolas On Mon, Nov 26, 2012 at 7:16 PM, Mohammad Tariq [hidden email]http://user/SendEmail.jtp?type=nodenode=4034419i=1 wrote: what -- If you reply to this email, your message will be added to the discussion below: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-from-a-remote-client-tp4034362p4034419.html To unsubscribe from Connecting to standalone HBase from a remote client, click herehttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034362code=bWF0YW5AY2xvdWRhbG9lLm9yZ3w0MDM0MzYyfC0xMDg3NTk1Njc3 . NAMLhttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-from-a-remote-client-tp4034362p4034439.html Sent from the HBase User mailing list archive at Nabble.com.
Re: standalone HBase instance fails to start
Thanks again, seems helpful for (Ubuntu) quick starting. On Mon, Nov 26, 2012 at 7:44 PM, stack-3 [via Apache HBase] ml-node+s679495n4034405...@n3.nabble.com wrote: On Sun, Nov 25, 2012 at 8:28 AM, matan [hidden email]http://user/SendEmail.jtp?type=nodenode=4034405i=0 wrote: Nothing. Maybe just link to it from http://hbase.apache.org/book/quickstart.html such that people for whom the quick start doesn't work, will have a direct route to this and other prerequisites. I just added note on loopback to the getting started: http://hbase.apache.org/book.html#quickstart I don't want to clutter the getting started w/ a long list of prereqs that actually are not needed putting up hbase in standalone mode; e.g. you don't need to make sure ssh to localhost is working when doing standalone. Thanks. Any other suggestions on how to improve the doc. are most welcome. St.Ack -- If you reply to this email, your message will be added to the discussion below: http://apache-hbase.679495.n3.nabble.com/standalone-HBase-instance-fails-to-start-tp4034333p4034405.html To unsubscribe from standalone HBase instance fails to start, click herehttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034333code=bWF0YW5AY2xvdWRhbG9lLm9yZ3w0MDM0MzMzfC0xMDg3NTk1Njc3 . NAMLhttp://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-hbase.679495.n3.nabble.com/standalone-HBase-instance-fails-to-start-tp4034333p4034440.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Connecting to standalone HBase from a remote client
Hi there- re: From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Yep. That's how it works. Then the client looks up ROOT/META and then the client talks directly to the RegionServers. http://hbase.apache.org/book.html#client On 11/27/12 8:52 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Matan, From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Hmaster registers its address with ZK. And from there client will come to know where to look for Hmaster. And if the Hmaster registers its address as 'localhost', the client will take it as the 'localhost', which is client's 'localhost' and not the 'localhost' where Hmaster is running. So, if you have the IP and hostname of the Hmaster in your /etc/hosts file the client can reach that machine without any problem as there is proper DNS resolution available. But this just is what I think. I need approval from the heavyweights. Stack sir?? Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 5:57 PM, matan ma...@cloudaloe.org wrote: Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which-server-should-be-contacted-for-what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me delineate - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered, as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific according to my interpretation of documentation, I don't directly see how they should solve the problem). Perhaps Mohammad you can explain why those zookeeper properties relate to how the master references itself towards zookeeper? Should I take it from St.Ack that there is currently no way to specify the master's remotely accessible server/ip in the HBase configuration? Anyway, my HBase server's /etc/hosts has just one line now, in case it got lost on the thread - 127.0.0.1 localhost 'server-name'. Everything works fine on the HBase server itself, the same client code runs perfectly there. Thanks again, Matan On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] ml-node+s679495n4034419...@n3.nabble.com wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region servers or hdfs as they would be all on the same localhost... Cheers, Nicolas On Mon, Nov 26, 2012 at 7:16 PM, Mohammad Tariq [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=1 wrote: what -- If you reply to this email, your message will be added to the discussion below: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-f rom-a-remote-client-tp4034362p4034419.html To unsubscribe from Connecting to standalone HBase from a remote client, click here http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=u nsubscribe_by_codenode=4034362code=bWF0YW5AY2xvdWRhbG9lLm9yZ3w0MDM0MzYy fC0xMDg3NTk1Njc3 . NAML http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=m acro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namesp aces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view. web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemai l.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3 Aemail.naml -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-f rom-a-remote-client-tp4034362p4034439.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Connecting to standalone HBase from a remote client
Thank you both for the comments :) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 8:56 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: You are right Mohammad, Regards Ram On Tue, Nov 27, 2012 at 8:53 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there- re: From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Yep. That's how it works. Then the client looks up ROOT/META and then the client talks directly to the RegionServers. http://hbase.apache.org/book.html#client On 11/27/12 8:52 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Matan, From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Hmaster registers its address with ZK. And from there client will come to know where to look for Hmaster. And if the Hmaster registers its address as 'localhost', the client will take it as the 'localhost', which is client's 'localhost' and not the 'localhost' where Hmaster is running. So, if you have the IP and hostname of the Hmaster in your /etc/hosts file the client can reach that machine without any problem as there is proper DNS resolution available. But this just is what I think. I need approval from the heavyweights. Stack sir?? Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 5:57 PM, matan ma...@cloudaloe.org wrote: Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which-server-should-be-contacted-for-what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me delineate - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered, as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific according to my interpretation of documentation, I don't directly see how they should solve the problem). Perhaps Mohammad you can explain why those zookeeper properties relate to how the master references itself towards zookeeper? Should I take it from St.Ack that there is currently no way to specify the master's remotely accessible server/ip in the HBase configuration? Anyway, my HBase server's /etc/hosts has just one line now, in case it got lost on the thread - 127.0.0.1 localhost 'server-name'. Everything works fine on the HBase server itself, the same client code runs perfectly there. Thanks again, Matan On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] ml-node+s679495n4034419...@n3.nabble.com wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region servers or hdfs as they would be all on the same localhost... Cheers, Nicolas On Mon, Nov 26, 2012 at 7:16 PM, Mohammad Tariq [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=1 wrote: what -- If you reply to this email, your message will be added to the discussion below: http://apache-hbase.679495.n3.nabble.com/Connecting-to-standalone-HBase-f rom-a-remote-client-tp4034362p4034419.html To unsubscribe from Connecting to standalone HBase from a remote client, click here http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=u nsubscribe_by_codenode=4034362code=bWF0YW5AY2xvdWRhbG9lLm9yZ3w0MDM0MzYy fC0xMDg3NTk1Njc3 . NAML http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=m acro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namesp
Re: Unable to Create Table in Hbase
HI, Ya am able to see the table and table description in hbase shell (list 'table_name' and describe 'table_name') but am unable to perform scan 'table_name' as i told earlier -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Unable-to-Create-Table-in-Hbase-tp4034375p4034443.html Sent from the HBase User mailing list archive at Nabble.com.
Question About Starting and Stopping HADOOP HBASE Cluster in Secure Mode
Hi, I currently use the following steps to start and stop HADOOP HBASE cluster: 1) Without Kerberos Security (start zookeepers) start the cluster from Master: {$HADOOP_HOME}/bin/start-dfs.sh // one command start all servers {$HADOOP_HOME}/bin/start-mapred.sh {$HBASE_HOME}/bin/start-hbase.sh stop the cluster from Master: {$HBASE_HOME}/bin/stop-hbase.sh {$HADOOP_HOME}/bin/stop-mapred.sh {$HADOOP_HOME}/bin/stop-dfs.sh (stop zookeepers) 2) With Kerberos and in Secure Mode start the HADOOP Namenode {$HADOOP_HOME}/bin/hadoop-daemon.sh start namenode then for each datanode: {$HADOOP_HOME}/bin/hadoop-daemon.sh start datanode then HBASE Master: {$HBASE_HOME}/bin/hbase-daemon.sh start master (as root) then for each HBASE regionserver: {$HBASE_HOME}/bin/hbase-daemon.sh start regionserver QUESTION: As I can see from 2) there are more steps to start the entire cluster in secure node, are there any existing commands which simplify start/stop HADOOP HBASE in Secure mode? Thanks ac
Re: Connecting to standalone HBase from a remote client
Hi Mohammad, I'm loosing track... I came to understand that ZK tells the client where the ROOT/META is, and from there the client gets the region server it should contact. And yet I take it that you are saying that the configuration for the location of the ROOT/META or region server should be done on the client side. These two ideas seem to present a contradiction, and I probably don't have a good grasp of what is going on, or what should be done. Can you or anyone try to clarify? Thanks, matan On Tue, Nov 27, 2012 at 5:33 PM, Tariq [via Apache HBase] ml-node+s679495n4034446...@n3.nabble.com wrote: Thank you both for the comments :) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 8:56 PM, ramkrishna vasudevan [hidden email] http://user/SendEmail.jtp?type=nodenode=4034446i=0 wrote: You are right Mohammad, Regards Ram On Tue, Nov 27, 2012 at 8:53 PM, Doug Meil [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=1 wrote: Hi there- re: From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Yep. That's how it works. Then the client looks up ROOT/META and then the client talks directly to the RegionServers. http://hbase.apache.org/book.html#client On 11/27/12 8:52 AM, Mohammad Tariq [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=2 wrote: Hello Matan, From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Hmaster registers its address with ZK. And from there client will come to know where to look for Hmaster. And if the Hmaster registers its address as 'localhost', the client will take it as the 'localhost', which is client's 'localhost' and not the 'localhost' where Hmaster is running. So, if you have the IP and hostname of the Hmaster in your /etc/hosts file the client can reach that machine without any problem as there is proper DNS resolution available. But this just is what I think. I need approval from the heavyweights. Stack sir?? Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 5:57 PM, matan [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=3 wrote: Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which-server-should-be-contacted-for-what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me delineate - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered, as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific according to my interpretation of documentation, I don't directly see how they should solve the problem). Perhaps Mohammad you can explain why those zookeeper properties relate to how the master references itself towards zookeeper? Should I take it from St.Ack that there is currently no way to specify the master's remotely accessible server/ip in the HBase configuration? Anyway, my HBase server's /etc/hosts has just one line now, in case it got lost on the thread - 127.0.0.1 localhost 'server-name'. Everything works fine on the HBase server itself, the same client code runs perfectly there. Thanks again, Matan On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=4 wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region servers or hdfs as they would be all on the same localhost...
Re: Unable to Create Table in Hbase
Can you paste the master logs and RS logs.. am sure that there should have been some errors in them.. That is why it is not able to locate the META Regards Ram On Tue, Nov 27, 2012 at 7:51 PM, shyam kumar lakshyam.sh...@gmail.comwrote: HI, Ya am able to see the table and table description in hbase shell (list 'table_name' and describe 'table_name') but am unable to perform scan 'table_name' as i told earlier -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Unable-to-Create-Table-in-Hbase-tp4034375p4034443.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Question About Starting and Stopping HADOOP HBASE Cluster in Secure Mode
AC, scripts start-dfs.sh and start-mapred.sh is just wrappers for hadoop-daemon.sh commands. All the security settings are in the configuration files, so same start procedure should work for both secure and unsecured modes. Just make sure you have correct configuration files. Thank you! Sincerely, Leonid Fedotov On Nov 27, 2012, at 3:28 AM, a...@hsk.hk wrote: Hi, I currently use the following steps to start and stop HADOOP HBASE cluster: 1) Without Kerberos Security (start zookeepers) start the cluster from Master: {$HADOOP_HOME}/bin/start-dfs.sh // one command start all servers {$HADOOP_HOME}/bin/start-mapred.sh {$HBASE_HOME}/bin/start-hbase.sh stop the cluster from Master: {$HBASE_HOME}/bin/stop-hbase.sh {$HADOOP_HOME}/bin/stop-mapred.sh {$HADOOP_HOME}/bin/stop-dfs.sh (stop zookeepers) 2) With Kerberos and in Secure Mode start the HADOOP Namenode {$HADOOP_HOME}/bin/hadoop-daemon.sh start namenode then for each datanode: {$HADOOP_HOME}/bin/hadoop-daemon.sh start datanode then HBASE Master: {$HBASE_HOME}/bin/hbase-daemon.sh start master (as root) then for each HBASE regionserver: {$HBASE_HOME}/bin/hbase-daemon.sh start regionserver QUESTION: As I can see from 2) there are more steps to start the entire cluster in secure node, are there any existing commands which simplify start/stop HADOOP HBASE in Secure mode? Thanks ac
Re: Connecting to standalone HBase from a remote client
Matan, in short, your client should be able to resolve all names for all HBMaster, HBRegionServers and all ZK nodes. DNS or local /etc/hosts file, does not matter, but names should be resolvable correctly on the client machine. Then it will be able to connect to ZK, got HBmaster and ROOT/META locations . Thank you! Sincerely, Leonid Fedotov On Nov 27, 2012, at 8:10 AM, matan wrote: Hi Mohammad, I'm loosing track... I came to understand that ZK tells the client where the ROOT/META is, and from there the client gets the region server it should contact. And yet I take it that you are saying that the configuration for the location of the ROOT/META or region server should be done on the client side. These two ideas seem to present a contradiction, and I probably don't have a good grasp of what is going on, or what should be done. Can you or anyone try to clarify? Thanks, matan On Tue, Nov 27, 2012 at 5:33 PM, Tariq [via Apache HBase] ml-node+s679495n4034446...@n3.nabble.com wrote: Thank you both for the comments :) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 8:56 PM, ramkrishna vasudevan [hidden email] http://user/SendEmail.jtp?type=nodenode=4034446i=0 wrote: You are right Mohammad, Regards Ram On Tue, Nov 27, 2012 at 8:53 PM, Doug Meil [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=1 wrote: Hi there- re: From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Yep. That's how it works. Then the client looks up ROOT/META and then the client talks directly to the RegionServers. http://hbase.apache.org/book.html#client On 11/27/12 8:52 AM, Mohammad Tariq [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=2 wrote: Hello Matan, From what I have understood, these properties are not for Hbase but for the Hbase client which we write. They tell the client where to look for ZK. Hmaster registers its address with ZK. And from there client will come to know where to look for Hmaster. And if the Hmaster registers its address as 'localhost', the client will take it as the 'localhost', which is client's 'localhost' and not the 'localhost' where Hmaster is running. So, if you have the IP and hostname of the Hmaster in your /etc/hosts file the client can reach that machine without any problem as there is proper DNS resolution available. But this just is what I think. I need approval from the heavyweights. Stack sir?? Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 5:57 PM, matan [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=3 wrote: Thanks guys, Excuse my ignorance, but having sort of agreed that the configuration that determines which-server-should-be-contacted-for-what is on the HBase server, I am not sure how any of the practical suggestions made should solve the issue, and enable connecting from a remote client. Let me delineate - setting /etc/hosts on my client side seems in this regard not relevant in that view. And the other suggestion for hbase-site.xml configuration I have already got covered, as my client code successfully connects to zookeeper (the configuration properties mentioned on this thread are zookeeper specific according to my interpretation of documentation, I don't directly see how they should solve the problem). Perhaps Mohammad you can explain why those zookeeper properties relate to how the master references itself towards zookeeper? Should I take it from St.Ack that there is currently no way to specify the master's remotely accessible server/ip in the HBase configuration? Anyway, my HBase server's /etc/hosts has just one line now, in case it got lost on the thread - 127.0.0.1 localhost 'server-name'. Everything works fine on the HBase server itself, the same client code runs perfectly there. Thanks again, Matan On Mon, Nov 26, 2012 at 10:15 PM, Tariq [via Apache HBase] [hidden email]http://user/SendEmail.jtp?type=nodenode=4034446i=4 wrote: Hello Nicolas, You are right. It has been deprecated. Thank you for updating my knowledge base..:) Regards, Mohammad Tariq On Tue, Nov 27, 2012 at 12:17 AM, Nicolas Liochon [hidden email] http://user/SendEmail.jtp?type=nodenode=4034419i=0 wrote: Hi Mohammad, Your answer was right, just that specifying the master address is not necessary (anymore I think). But it does no harm. Changing the /etc/hosts (as you did) is right too. Lastly, if the cluster is standalone and accessed locally, having localhost in ZK will not be an issue. However, it's perfectly possible to have a standalone cluster accessed remotely, so you don't want to have the master to write I'm on the server named localhost in this case. I expect it won't be an issue for communications between the region
Re: Expert suggestion needed to create table in Hbase - Banking
Ian Varley's excellent HBaseCon presentation is another great resource. http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there, somebody already wisely mentioned the link to the # of CF's entry, but here are a few other entries that can save you some heartburn if you read them ahead of time. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema http://hbase.apache.org/book.html#architecture On 11/26/12 5:28 AM, Mohammad Tariq donta...@gmail.com wrote: Hello sir, You might become a victim of RS hotspotting, since the cutomerIDs will be sequential(I assume). To keep things simple Hbase puts all the rows with similar keys to the same RS. But, it becomes a bottleneck in the long run as all the data keeps on going to the same region. HTH Regards, Mohammad Tariq On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi, Thanks! Can we have the customer number as the RowKey for the customer (client) master table? Please help in educating me on the advantage and disadvantage of having customer number as the Row key... Also SCD2 we may need to implement in that table.. will it work if I have like that? Or SCD2 is not needed instead we can achieve the same by increasing the version number that it will hold? pls suggest... regards, Rams On Mon, Nov 26, 2012 at 1:10 PM, Li, Min m...@microstrategy.com wrote: When 1 cf need to do split, other 599 cfs will split at the same time. So many fragments will be produced when you use so many column families. Actually, many cfs can be merge to only one cf with specific tags in rowkey. For example, rowkey of customer address can be uid+'AD', and customer profile can be uid+'PR'. Min -Original Message- From: Ramasubramanian Narayanan [mailto: ramasubramanian.naraya...@gmail.com] Sent: Monday, November 26, 2012 3:05 PM To: user@hbase.apache.org Subject: Expert suggestion needed to create table in Hbase - Banking Hi, I have a requirement of physicalising the logical model... I have a client model which has 600+ entities... Need suggestion how to go about physicalising it... I have few other doubts : 1) Whether is it good to create a single table for all the 600+ columns? 2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family? Please help on this.. regards, Rams
RE: Backup strategy
Lars, thanks for the great post. However I am using HBase 0.90.6 :( What is the best approach in my case? My data is not very big 100GB divided into 4 tables. I don't need daily backup, weekly maybe. But I need to be able to fully restore the state (all data in a consistent state) if my migration goes wrong. Thanks, Pablo -Original Message- From: lars hofhansl [mailto:lhofha...@yahoo.com] Sent: quinta-feira, 15 de novembro de 2012 15:46 To: user@hbase.apache.org Subject: Re: Backup strategy Here's one way: http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html From: David Charle dbchar2...@gmail.com To: user@hbase.apache.org Sent: Thursday, November 15, 2012 7:41 AM Subject: Backup strategy Hi Anyone using any backup strategy (other than replication) for any point-in-time restore ? Any recommendations on best practices ? -- David
Re: recommended nodes
Hi Michael, so are you recommanding 32Gb per node? What about the disks? SATA drives are to slow? JM 2012/11/26, Michael Segel michael_se...@hotmail.com: Uhm, those specs are actually now out of date. If you're running HBase, or want to also run R on top of Hadoop, you will need to add more memory. Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be disk i/o bound way too quickly. On Nov 26, 2012, at 8:05 AM, Marcos Ortiz mlor...@uci.cu wrote: Are you asking about hardware recommendations? Eric Sammer on his Hadoop Operations book, did a great job about this: For middle size clusters (until 300 nodes): Processor: A dual quad-core 2.6 Ghz RAM: 24 GB DDR3 Dual 1 Gb Ethernet NICs a SAS drive controller at least two SATA II drives in a JBOD configuration The replication factor depends heavily of the primary use of your cluster. On 11/26/2012 08:53 AM, David Charle wrote: hi what's the recommended nodes for NN, hmaster and zk nodes for a larger cluster, lets say 50-100+ also, what would be the ideal replication factor for larger clusters when u have 3-4 racks ? -- David 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda about.me/marcosortiz http://about.me/marcosortiz @marcosluis2186 http://twitter.com/marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: recommended nodes
OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside joke ...] So here's the problem... By default, your child processes in a map/reduce job get a default 512MB. The majority of the time, this gets raised to 1GB. 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: This is why when people talk about the number of cores, you have to specify physical cores or logical cores) So if you were to over subscribe and have lets say 12 mappers and 12 reducers, that's 24 slots. Which means that you would need 24GB of memory reserved just for the child processes. This would leave 8GB for DN, TT and the rest of the linux OS processes. Can you live with that? Sure. Now add in R, HBase, Impala, or some other set of tools on top of the cluster. Ooops! Now you are in trouble because you will swap. Also adding in R, you may want to bump up those child procs from 1GB to 2 GB. That means the 24 slots would now require 48GB. Now you have swap and if that happens you will see HBase in a cascading failure. So while you can do a rolling restart with the changed configuration (reducing the number of mappers and reducers) you end up with less slots which will mean in longer run time for your jobs. (Less slots == less parallelism ) Looking at the price of memory... you can get 48GB or even 64GB for around the same price point. (8GB chips) And I didn't even talk about adding SOLR either again a memory hog... ;-) Note that I matched the number of mappers w reducers. You could go with fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to reducers, depending on the work flow As to the disks... no 7200 SATA III drives are fine. SATA III interface is pretty much available in the new kit being shipped. Its just that you don't have enough drives. 8 cores should be 8 spindles if available. Otherwise you end up seeing your CPU load climb on wait states as the processes wait for the disk i/o to catch up. I mean you could build out a cluster w 4 x 3 3.5 2TB drives in a 1 U chassis based on price. You're making a trade off and you should be aware of the performance hit you will take. HTH -Mike On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Michael, so are you recommanding 32Gb per node? What about the disks? SATA drives are to slow? JM 2012/11/26, Michael Segel michael_se...@hotmail.com: Uhm, those specs are actually now out of date. If you're running HBase, or want to also run R on top of Hadoop, you will need to add more memory. Also forget 1GBe got 10GBe, and w 2 SATA drives, you will be disk i/o bound way too quickly. On Nov 26, 2012, at 8:05 AM, Marcos Ortiz mlor...@uci.cu wrote: Are you asking about hardware recommendations? Eric Sammer on his Hadoop Operations book, did a great job about this: For middle size clusters (until 300 nodes): Processor: A dual quad-core 2.6 Ghz RAM: 24 GB DDR3 Dual 1 Gb Ethernet NICs a SAS drive controller at least two SATA II drives in a JBOD configuration The replication factor depends heavily of the primary use of your cluster. On 11/26/2012 08:53 AM, David Charle wrote: hi what's the recommended nodes for NN, hmaster and zk nodes for a larger cluster, lets say 50-100+ also, what would be the ideal replication factor for larger clusters when u have 3-4 racks ? -- David 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda about.me/marcosortiz http://about.me/marcosortiz @marcosluis2186 http://twitter.com/marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Do you know what's the release time of hbase 0.96.0 and 0.94.3?
Please don't send same question to three different mailing lists. See below for answers. On Tue, Nov 27, 2012 at 6:59 PM, 张莉苹 zlpmiche...@gmail.com wrote: *Do you know what's the release time of apache hbase 0.96.0 and hbase 0.94.3?* 0.94.3 should be out in a week or two. 0.96.0 start of next year hopefully. I just saw there was a piece of news *December 4th, 2012 0.96 Bug Squashing and Testing Hackathonhttp://www.meetup.com/hackathon/events/90536432/at Cloudera, SF * in http://www.meetup.com/hackathon/events/90536432/. Does that mean apache hbase 0.96.0 will be finally released *before Dec. 4th *? No. Devs are going to hang around for a day working on 0.96 issues and talking about how to get 0.96 out the door. BTW, I think hbase 0.94.2 is the latest stable and released version in the community, right? Thats right! Yours, St.Ack
Regarding rework in changing column family
Hi, I have created table in hbase with one column family and planned to release for development (in pentaho). Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome? Regards, Rams
Re: Expert suggestion needed to create table in Hbase - Banking
Hi, Thanks!! Can someone help in suggesting what is the best rowkey that we can use in this scenario. Regards, Rams On 27-Nov-2012, at 10:37 PM, Suraj Varma svarma...@gmail.com wrote: Ian Varley's excellent HBaseCon presentation is another great resource. http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there, somebody already wisely mentioned the link to the # of CF's entry, but here are a few other entries that can save you some heartburn if you read them ahead of time. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema http://hbase.apache.org/book.html#architecture On 11/26/12 5:28 AM, Mohammad Tariq donta...@gmail.com wrote: Hello sir, You might become a victim of RS hotspotting, since the cutomerIDs will be sequential(I assume). To keep things simple Hbase puts all the rows with similar keys to the same RS. But, it becomes a bottleneck in the long run as all the data keeps on going to the same region. HTH Regards, Mohammad Tariq On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi, Thanks! Can we have the customer number as the RowKey for the customer (client) master table? Please help in educating me on the advantage and disadvantage of having customer number as the Row key... Also SCD2 we may need to implement in that table.. will it work if I have like that? Or SCD2 is not needed instead we can achieve the same by increasing the version number that it will hold? pls suggest... regards, Rams On Mon, Nov 26, 2012 at 1:10 PM, Li, Min m...@microstrategy.com wrote: When 1 cf need to do split, other 599 cfs will split at the same time. So many fragments will be produced when you use so many column families. Actually, many cfs can be merge to only one cf with specific tags in rowkey. For example, rowkey of customer address can be uid+'AD', and customer profile can be uid+'PR'. Min -Original Message- From: Ramasubramanian Narayanan [mailto: ramasubramanian.naraya...@gmail.com] Sent: Monday, November 26, 2012 3:05 PM To: user@hbase.apache.org Subject: Expert suggestion needed to create table in Hbase - Banking Hi, I have a requirement of physicalising the logical model... I have a client model which has 600+ entities... Need suggestion how to go about physicalising it... I have few other doubts : 1) Whether is it good to create a single table for all the 600+ columns? 2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family? Please help on this.. regards, Rams
Re: Regarding rework in changing column family
As far as i see altering the table with the new columnfamily should be easier. - disable the table - Issue modify table command with the new col family. - run a compaction. Now after this when you start doing your puts, they should be in alignment with the new schema defined for the table. You may have to see one thing is how much your rate of puts is getting affected because now both of your CFs will start flushing whenever a memstore flush happens. Hope this helps. Regards Ram On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, I have created table in hbase with one column family and planned to release for development (in pentaho). Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome? Regards, Rams
Re: Expert suggestion needed to create table in Hbase - Banking
Hi Rams, IMHO, you need to go through http://hbase.apache.org/book.html and the book HBase:The Definitive Guide to get a deeper understanding of HBase. It will help you in designing your system. There is no magical trick to design the most efficient/best RowKey without knowing the detailed requirements, constraints and carrying out couple of experiments. HTH, Anil On Tue, Nov 27, 2012 at 8:44 PM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, Thanks!! Can someone help in suggesting what is the best rowkey that we can use in this scenario. Regards, Rams On 27-Nov-2012, at 10:37 PM, Suraj Varma svarma...@gmail.com wrote: Ian Varley's excellent HBaseCon presentation is another great resource. http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there, somebody already wisely mentioned the link to the # of CF's entry, but here are a few other entries that can save you some heartburn if you read them ahead of time. http://hbase.apache.org/book.html#datamodel http://hbase.apache.org/book.html#schema http://hbase.apache.org/book.html#architecture On 11/26/12 5:28 AM, Mohammad Tariq donta...@gmail.com wrote: Hello sir, You might become a victim of RS hotspotting, since the cutomerIDs will be sequential(I assume). To keep things simple Hbase puts all the rows with similar keys to the same RS. But, it becomes a bottleneck in the long run as all the data keeps on going to the same region. HTH Regards, Mohammad Tariq On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi, Thanks! Can we have the customer number as the RowKey for the customer (client) master table? Please help in educating me on the advantage and disadvantage of having customer number as the Row key... Also SCD2 we may need to implement in that table.. will it work if I have like that? Or SCD2 is not needed instead we can achieve the same by increasing the version number that it will hold? pls suggest... regards, Rams On Mon, Nov 26, 2012 at 1:10 PM, Li, Min m...@microstrategy.com wrote: When 1 cf need to do split, other 599 cfs will split at the same time. So many fragments will be produced when you use so many column families. Actually, many cfs can be merge to only one cf with specific tags in rowkey. For example, rowkey of customer address can be uid+'AD', and customer profile can be uid+'PR'. Min -Original Message- From: Ramasubramanian Narayanan [mailto: ramasubramanian.naraya...@gmail.com] Sent: Monday, November 26, 2012 3:05 PM To: user@hbase.apache.org Subject: Expert suggestion needed to create table in Hbase - Banking Hi, I have a requirement of physicalising the logical model... I have a client model which has 600+ entities... Need suggestion how to go about physicalising it... I have few other doubts : 1) Whether is it good to create a single table for all the 600+ columns? 2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family? Please help on this.. regards, Rams -- Thanks Regards, Anil Gupta
Re: Regarding rework in changing column family
Thanks Ram!!! My question is like this... suppose I have create a table with 100 columns with single column family 'cf1', now in production there are billions of records are there in that table and there are mulitiple programs that is feeding into this table (let us take some 50 programs)... In this scenario, if I change the column family like first 40 columns let it be in 'cf1', the last 60 columns I want to move to new column family 'cf2', in this case, *do we need to change all 50 programs which are inserting into that table with 'cf1' for all columns?* * * regards, Rams On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: As far as i see altering the table with the new columnfamily should be easier. - disable the table - Issue modify table command with the new col family. - run a compaction. Now after this when you start doing your puts, they should be in alignment with the new schema defined for the table. You may have to see one thing is how much your rate of puts is getting affected because now both of your CFs will start flushing whenever a memstore flush happens. Hope this helps. Regards Ram On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, I have created table in hbase with one column family and planned to release for development (in pentaho). Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome? Regards, Rams
答复: Best practice for the naming convention of column family
according to http://hbase.apache.org/book.html#number.of.cfs; - 6.3.2.1. Column Families : Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. d for data/default). 发件人: Ramasubramanian Narayanan [ramasubramanian.naraya...@gmail.com] 发送时间: 2012年11月28日 13:51 收件人: user@hbase.apache.org 主题: Best practice for the naming convention of column family Hi, Can anyone suggest the best practice for the naming convention for the column family pls.. regards, Rams
Re: Regarding rework in changing column family
I am afraid it has to be changed...Because for your puts to go to the specified Col family the col family name should appear in your Puts that is created by the client. Regards Ram On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Thanks Ram!!! My question is like this... suppose I have create a table with 100 columns with single column family 'cf1', now in production there are billions of records are there in that table and there are mulitiple programs that is feeding into this table (let us take some 50 programs)... In this scenario, if I change the column family like first 40 columns let it be in 'cf1', the last 60 columns I want to move to new column family 'cf2', in this case, *do we need to change all 50 programs which are inserting into that table with 'cf1' for all columns?* * * regards, Rams On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: As far as i see altering the table with the new columnfamily should be easier. - disable the table - Issue modify table command with the new col family. - run a compaction. Now after this when you start doing your puts, they should be in alignment with the new schema defined for the table. You may have to see one thing is how much your rate of puts is getting affected because now both of your CFs will start flushing whenever a memstore flush happens. Hope this helps. Regards Ram On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, I have created table in hbase with one column family and planned to release for development (in pentaho). Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome? Regards, Rams
Parallel reading advice
I have a table who's keys are prefixed with a byte to help distribute the keys so scans don't hotspot. I also have a bunch of slave processes that work to scan the prefix partitions in parallel. Currently each slave sets up their own hbase connection, scanner, etc.. Most of the slave processes finish their scan and return within 2-3 seconds. It tends to take the same amount of time regardless of if there's lots of data, or very little. So I think that 2 sec overhead is there because each slave will setup a new connection on each request (I am unable to reuse connections in the slaves). I'm wondering if I could remove some of that overhead by using the master (which can reuse it's hbase connection) to determine the splits, and then delegating that information out to each slave. I think I could possible use TableInputFormat/TableRecordReader to accomplish this? Would this route make sense?
RE: Regarding rework in changing column family
Also what about the current data in the table. Now all are under the single CF. Modifying the table with addition of a new CF will not move data to the new family! Remember HBase only deals with CF at the table schema level. There is no qualifiers in the schema as such. When data is inserted/retrieved we can specify a qualifier. -Anoop- From: ramkrishna vasudevan [ramkrishna.s.vasude...@gmail.com] Sent: Wednesday, November 28, 2012 11:41 AM To: user@hbase.apache.org Subject: Re: Regarding rework in changing column family I am afraid it has to be changed...Because for your puts to go to the specified Col family the col family name should appear in your Puts that is created by the client. Regards Ram On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Thanks Ram!!! My question is like this... suppose I have create a table with 100 columns with single column family 'cf1', now in production there are billions of records are there in that table and there are mulitiple programs that is feeding into this table (let us take some 50 programs)... In this scenario, if I change the column family like first 40 columns let it be in 'cf1', the last 60 columns I want to move to new column family 'cf2', in this case, *do we need to change all 50 programs which are inserting into that table with 'cf1' for all columns?* * * regards, Rams On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: As far as i see altering the table with the new columnfamily should be easier. - disable the table - Issue modify table command with the new col family. - run a compaction. Now after this when you start doing your puts, they should be in alignment with the new schema defined for the table. You may have to see one thing is how much your rate of puts is getting affected because now both of your CFs will start flushing whenever a memstore flush happens. Hope this helps. Regards Ram On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, I have created table in hbase with one column family and planned to release for development (in pentaho). Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome? Regards, Rams