Fixing badly distributed table manually.
Hello, A couple of questions regarding balancing of a table's data in HBase. a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a badly distributed table and how can I re-balance manually? c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. As for a) I tried running this script: https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb like so: hbase org.jruby.Main ./list_regions.rb _my_table but I get ArgumentError: wrong number of arguments (1 for 2) (root) at ./list_regions.rb:60 If someone more proficient notices an obvious fix, I'd be glad to hear about it. Why do I ask? I have the impression that one of the tables on our HBase cluster is not well distributed. When running a Map Reduce job on this table, the load average on a single node is very high, whereas all other nodes are almost idling. It is the only table where this behavior is observed. Other Map Reduce jobs result in slightly elevated load averages on several machines. Thank you, /David
Re: Fixing badly distributed table manually.
Can you tell us the version of HBase you're using. The following feature (per table region balancing) isn't in 0.92.x: https://issues.apache.org/jira/browse/HBASE-3373 On table.jsp page, you should see region count per region server. Cheers On Tue, Sep 4, 2012 at 7:56 AM, David Koch ogd...@googlemail.com wrote: Hello, A couple of questions regarding balancing of a table's data in HBase. a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a badly distributed table and how can I re-balance manually? c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. As for a) I tried running this script: https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb like so: hbase org.jruby.Main ./list_regions.rb _my_table but I get ArgumentError: wrong number of arguments (1 for 2) (root) at ./list_regions.rb:60 If someone more proficient notices an obvious fix, I'd be glad to hear about it. Why do I ask? I have the impression that one of the tables on our HBase cluster is not well distributed. When running a Map Reduce job on this table, the load average on a single node is very high, whereas all other nodes are almost idling. It is the only table where this behavior is observed. Other Map Reduce jobs result in slightly elevated load averages on several machines. Thank you, /David
Reading in parallel from table's regions in MapReduce
Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
RE: Fixing badly distributed table manually.
a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I usually see by the web interface (host:60010). Click on a table and scroll down. There will be a region count of this table across the cluster. b) What constitutes a badly distributed table and how can I re-balance manually? I think the answer to this questions is manually split. There is a chapter in the book talking about it. I am looking forward for an answer from the experienced guys ;) c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. From my experience yes. HBase does not balance as much as I need. In the worst case I have a difference of 16 regions (32 against 48) in a 10 machine cluster. Hoping for a great answer so I don't have to do manual splits ;) Regards, Pablo -Original Message- From: David Koch [mailto:ogd...@googlemail.com] Sent: terça-feira, 4 de setembro de 2012 11:56 To: user@hbase.apache.org Subject: Fixing badly distributed table manually. Hello, A couple of questions regarding balancing of a table's data in HBase. a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a badly distributed table and how can I re-balance manually? c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. As for a) I tried running this script: https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb like so: hbase org.jruby.Main ./list_regions.rb _my_table but I get ArgumentError: wrong number of arguments (1 for 2) (root) at ./list_regions.rb:60 If someone more proficient notices an obvious fix, I'd be glad to hear about it. Why do I ask? I have the impression that one of the tables on our HBase cluster is not well distributed. When running a Map Reduce job on this table, the load average on a single node is very high, whereas all other nodes are almost idling. It is the only table where this behavior is observed. Other Map Reduce jobs result in slightly elevated load averages on several machines. Thank you, /David
Re: batch update question
Hi Christian, I read through the link you referred. It seems HBaseHUT is exactly the solution I am looking for. Before making the technology choice decision, I want to learn a bit more about its internal design and the general idea of HBaseHUT of how throughput of write is improved. From the discussion, CP is mentioned. But I cannot find more details, appreciate if you could point me to some more detailed documents. Thanks. regards, Lin On Tue, Sep 4, 2012 at 5:28 AM, Christian Schäfer syrious3...@yahoo.dewrote: hi, maybe you could be interrested in hbase hut (high update throughput) see https://github.com/sematext/HBaseHUT -- Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are the batch update methods we could leverage to resolve the issue? And for API client, what are the batch update methods we could leverage to resolve the issue? thanks in advance, Lin
Help with troubleshooting the HBase replication setup
Hi there, I'm trying to set up replication in master-slave mode between two clusters, and when this works set up master-master replication. Following the replication FAQ step-by-step, but I can't make it work and have no idea how to troubleshoot. There seem to be given only one way to find out when it works, this is to look for these in the region server logs: Considering 1 rs, with ratio 0.1 Getting 1 rs from peer cluster # 0 Choosing peer 170.22.64.15:62020 Well, whatever I do, I can not see them. When I run add_peer nothing happens. It's the week as I'm stuck with it - stopping/starting/reinstalling my clusters, to no avail. Both clusters are CDH4.0.1. I have hbase.replication=true on both clusters. My table exists on both clusters. The family is marked with REPLICATION_SCOPE=1 on both clusters. Machines in both of the clusters can access each other machine. Can anyone help please? Where do I look to understand what is wrong? Setting logging to DEBUG in HBase doesn't give me anything apart from a lot more noise. Thanks, Stas
Re: Reading in parallel from table's regions in MapReduce
Hi there- Yes, there is an input split for each region of the source table of a MR job. There is a blurb on that in the RefGuide... http://hbase.apache.org/book.html#splitter On 9/4/12 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Fixing badly distributed table manually.
Hello, a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a badly distributed table and how can I re-balance manually? c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. I have found that http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/ is a good source of information/tools to look at regions balancing in the cluster and investigate it. As for a) I tried running this script: https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb like so: hbase org.jruby.Main ./list_regions.rb _my_table but I get ArgumentError: wrong number of arguments (1 for 2) (root) at ./list_regions.rb:60 If someone more proficient notices an obvious fix, I'd be glad to hear about it. Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that this is a repository that is no longer maintained and was written for old releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer releases. Cheers --- Guillaume
Re: Reading in parallel from table's regions in MapReduce
Thank you very much for responding, but this was not exactly what I was looking for. I have understood the splitting process when M/R jobs read from HBase tables (that each M/R task reads from exactly one region). What I would like to clarify if possible is, if there is indeed some locking between map tasks concerning reading from different table's regions (because I noticed a sequential reading behaviour from the different map tasks), and if so, how I could avoid it, in order to speed up the procedure and make map tasks read data in parallel (each from its respective region). Thank you again very much, hoping there is an answer to that, Ioakim On 09/04/2012 06:32 PM, Doug Meil wrote: Hi there- Yes, there is an input split for each region of the source table of a MR job. There is a blurb on that in the RefGuide... http://hbase.apache.org/book.html#splitter On 9/4/12 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the rows that match your scan request. There is no lock or blocking. I think you really should actually read Lars George's book on HBase to get a better understanding. HTH -Mike On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote: Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
I understood that locking is at a row-level (and that my initial hypothesis is hopefully false) , but I was trying to clarify if there is some job configuration I am missing. Perhaps you 're right and I am misinterpreting the jobtracker's map completion graph. Thanks for answering. On 09/04/2012 07:41 PM, Michael Segel wrote: I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the rows that match your scan request. There is no lock or blocking. I think you really should actually read Lars George's book on HBase to get a better understanding. HTH -Mike On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote: Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description - make sure to read the examples: http://hbase.apache.org/book/mapreduce.example.html Hadoop Related: - http://wiki.apache.org/hadoop/JobTracker - http://wiki.apache.org/hadoop/TaskTracker - http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html - Some Configurations: http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html HTH, Jerry On Tue, Sep 4, 2012 at 12:41 PM, Michael Segel michael_se...@hotmail.comwrote: I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the rows that match your scan request. There is no lock or blocking. I think you really should actually read Lars George's book on HBase to get a better understanding. HTH -Mike On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote: Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
Jerry thank you very much for the links. Regards, Ioakim On 09/04/2012 08:05 PM, Jerry Lam wrote: Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description - make sure to read the examples: http://hbase.apache.org/book/mapreduce.example.html Hadoop Related: - http://wiki.apache.org/hadoop/JobTracker - http://wiki.apache.org/hadoop/TaskTracker - http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html - Some Configurations: http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html HTH, Jerry On Tue, Sep 4, 2012 at 12:41 PM, Michael Segel michael_se...@hotmail.comwrote: I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the rows that match your scan request. There is no lock or blocking. I think you really should actually read Lars George's book on HBase to get a better understanding. HTH -Mike On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote: Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: connection error to remote hbase node
Thanks, Harsh J, but I have checked /etc/ dir and hbase's root directory, there is no zoo.cfg file present in both places... I am aware that hbase client will first check zookeeper before contacting hbase itself (for -ROOT- table and .META table ...). is there anyway - to test if zookeeper can be connected successfully? - to test correctness of the result from lookup zookeeper (but before querying -ROOT- table). This looks general problem to setup hbase... and thanks. regards, Richard On Mon, Sep 3, 2012 at 1:10 AM, Harsh J ha...@cloudera.com wrote: Richard, Do you perhaps have a stray /etc/zookeeper/zoo.cfg file lying around on your client node? I think the issue may be that its picking up a default of localhost from reading an non-configured zoo.cfg on the classpath. On Sun, Sep 2, 2012 at 7:08 PM, Richard Tang tristartom.t...@gmail.com wrote: Hi, I have a connection problem on setting up hbase on remote node. The ``hbase`` instance is on a machine ``nodeA``. when I am trying to use hbase on ``nodeA`` from another machine (say ``nodeB``), it complains Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused ... Also, I used hbase shell from ``nodeB`` to test hbase against ``nodeA``, but still without success. In the shell, when issuing ``list``, it gets stuck there... To diagnose, I can however run following test program with success. 1.zkcli utility can be tested against zookeeper on ``ndoeA``, as ./bin/hbase zkcli -server nodeA:2181 2.I can also successfully ``ssh`` to the node, like ``ssh -X root@nodeA ``. What could possibly be the reason for that error? regards, Tang -- Harsh J
Re: Key formats and very low cardinality leading fields
Thanks again, both of you. I'll look at pre splitting the regions so that there isn't so much initial contention. The issue I'll have though is that I won't know all the prefix values at first and will have to be able to add them later. Is it possible to split regions on an existing table? Or is that inadvisable in favor of doing the splits when the table is created? On Mon, Sep 3, 2012 at 5:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote: You can also look at pre-splitting the regions for timeseries type data. On Mon, Sep 3, 2012 at 1:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Initially your table will contain only one region. When you will reach its maximum size, it will split into 2 regions will are going to be distributed over the cluster. The 2 regions are going to be ordered by keys.So all entries starting with 1 will be on the first region. And the middle key (let's say 25..) will start the 2nd region. So region 1 will contain 1 to 24999. and the 2nd region will contain keys from 25 And so on. Since keys are ordered, all keys starting with a 1 are going to be closeby on the same region, expect if the region is big enought to be splitted and the servers by more region servers. So when you will load all your entries starting with 1, or 3, they will go on one uniq region. Only entries starting with 2 are going to be sometime on region 1, sometime on region 2. Of course, the more data you will load, the more regions you will have, the less hotspoting you will have. But at the beginning, it might be difficult for some of your servers. 2012/9/3, Eric Czech e...@nextbigsound.com: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the same region even if the cardinality of the 2nd part of the key is high. If the first number is always changing between 1 and 30 for each write, then you will reach multiple region/servers if you have, else, you might have some hot-stopping. JM 2012/9/3, Eric Czech e...@nextbigsound.com: Hi everyone, I was curious whether or not I should expect any write hot spots if I structured my composite keys in a way such that the first field is a low cardinality (maybe 30 distinct values) value and the next field contains a very high cardinality value
Re: Key formats and very low cardinality leading fields
Hi Eric, Yes you can split and existing region. You can do that easily with the web interface. After the split, at some point, one of the 2 regions will be moved to another server to balanced the load. You can also move it manually. JM 2012/9/4, Eric Czech e...@nextbigsound.com: Thanks again, both of you. I'll look at pre splitting the regions so that there isn't so much initial contention. The issue I'll have though is that I won't know all the prefix values at first and will have to be able to add them later. Is it possible to split regions on an existing table? Or is that inadvisable in favor of doing the splits when the table is created? On Mon, Sep 3, 2012 at 5:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote: You can also look at pre-splitting the regions for timeseries type data. On Mon, Sep 3, 2012 at 1:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Initially your table will contain only one region. When you will reach its maximum size, it will split into 2 regions will are going to be distributed over the cluster. The 2 regions are going to be ordered by keys.So all entries starting with 1 will be on the first region. And the middle key (let's say 25..) will start the 2nd region. So region 1 will contain 1 to 24999. and the 2nd region will contain keys from 25 And so on. Since keys are ordered, all keys starting with a 1 are going to be closeby on the same region, expect if the region is big enought to be splitted and the servers by more region servers. So when you will load all your entries starting with 1, or 3, they will go on one uniq region. Only entries starting with 2 are going to be sometime on region 1, sometime on region 2. Of course, the more data you will load, the more regions you will have, the less hotspoting you will have. But at the beginning, it might be difficult for some of your servers. 2012/9/3, Eric Czech e...@nextbigsound.com: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the same region even if the cardinality of the 2nd part of the key is high. If the first number is always changing between 1 and 30 for each write, then you will reach multiple region/servers if you have, else, you might have some hot-stopping. JM
Re: Key formats and very low cardinality leading fields
You're the man Jean-Marc .. info is much appreciated. On Tue, Sep 4, 2012 at 1:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, Yes you can split and existing region. You can do that easily with the web interface. After the split, at some point, one of the 2 regions will be moved to another server to balanced the load. You can also move it manually. JM 2012/9/4, Eric Czech e...@nextbigsound.com: Thanks again, both of you. I'll look at pre splitting the regions so that there isn't so much initial contention. The issue I'll have though is that I won't know all the prefix values at first and will have to be able to add them later. Is it possible to split regions on an existing table? Or is that inadvisable in favor of doing the splits when the table is created? On Mon, Sep 3, 2012 at 5:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote: You can also look at pre-splitting the regions for timeseries type data. On Mon, Sep 3, 2012 at 1:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Initially your table will contain only one region. When you will reach its maximum size, it will split into 2 regions will are going to be distributed over the cluster. The 2 regions are going to be ordered by keys.So all entries starting with 1 will be on the first region. And the middle key (let's say 25..) will start the 2nd region. So region 1 will contain 1 to 24999. and the 2nd region will contain keys from 25 And so on. Since keys are ordered, all keys starting with a 1 are going to be closeby on the same region, expect if the region is big enought to be splitted and the servers by more region servers. So when you will load all your entries starting with 1, or 3, they will go on one uniq region. Only entries starting with 2 are going to be sometime on region 1, sometime on region 2. Of course, the more data you will load, the more regions you will have, the less hotspoting you will have. But at the beginning, it might be difficult for some of your servers. 2012/9/3, Eric Czech e...@nextbigsound.com: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the
Re: Key formats and very low cardinality leading fields
I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until they eventually get balanced out. (Note: I'm not an expert on how hbase balances the regions across a region server so I couldn't tell you how it choses which nodes to place each region.) But what are you trying to do? Avoid a hot spot on the initial load, or are you looking at the longer term picture? On Sep 3, 2012, at 2:58 PM, Eric Czech e...@nextbigsound.com wrote: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the same region even if the cardinality of the 2nd part of the key is high. If the first number is always changing between 1 and 30 for each write, then you will reach multiple region/servers if you have, else, you might have some hot-stopping. JM 2012/9/3, Eric Czech e...@nextbigsound.com: Hi everyone, I was curious whether or not I should expect any write hot spots if I structured my composite keys in a way such that the first field is a low cardinality (maybe 30 distinct values) value and the next field contains a very high cardinality value that would not be written sequentially. More concisely, I want to do this: Given one number between 1 and 30, write many millions of rows with keys like number chosen : some generally distinct, non-sequential value Would there be any problem with the millions of writes happening with the same first field key prefix even if the second field is largely unique? Thank you!
Re: Key formats and very low cardinality leading fields
Longer term .. what's really going to happen is more like I'll have a first field value of 1, 2, and maybe 3. I won't know 4 - 10 for a while and the *second *value after each initial value will be, although highly unique, relatively exclusive for a given first value. This means that even if I didn't use the leading prefix, I'd have more or less the same problem where all the writes are going to the same region when I introduce a new set of second values. In case the generalities are confusing, the prefix value is a data source identifier and the second value is an identifier for entities within that source. The entity identifiers for a given source are likely to span different numeric or alpha-numeric ranges, but they probably won't be the same ranges across sources. Also, I won't know all those ranges (or sources for that matter) upfront. I'm concerned about the introduction of a new data source (= leading prefix value) since the first writes will be to the same region and ideally I'd be able to get a sense of how the second values are split for the new leading prefix and split an HBase region to reflect that. If that's not possible or just turns out to be a pain, then I can live with the introduction of the new prefix being a little slow until the regions split and distribute effectively. That make sense? On Tue, Sep 4, 2012 at 1:34 PM, Michael Segel michael_se...@hotmail.comwrote: I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until they eventually get balanced out. (Note: I'm not an expert on how hbase balances the regions across a region server so I couldn't tell you how it choses which nodes to place each region.) But what are you trying to do? Avoid a hot spot on the initial load, or are you looking at the longer term picture? On Sep 3, 2012, at 2:58 PM, Eric Czech e...@nextbigsound.com wrote: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the same region even if the cardinality of the 2nd part of the key is high. If the first number is always changing between 1 and 30 for each write, then you will reach multiple region/servers if you have, else, you might have some hot-stopping. JM
Re: Key formats and very low cardinality leading fields
Eric, So here's the larger question... How does the data flow in to the system? One source at a time? The second field. Is it sequential? If not sequential, is it going to be some sort of incremental larger than a previous value? (Are you always inserting to the left side of the queue? How are you using the data when you pull it from the database? 'Hot spotting' may be unavoidable and depending on other factors, it may be a moot point. On Sep 4, 2012, at 12:56 PM, Eric Czech e...@nextbigsound.com wrote: Longer term .. what's really going to happen is more like I'll have a first field value of 1, 2, and maybe 3. I won't know 4 - 10 for a while and the *second *value after each initial value will be, although highly unique, relatively exclusive for a given first value. This means that even if I didn't use the leading prefix, I'd have more or less the same problem where all the writes are going to the same region when I introduce a new set of second values. In case the generalities are confusing, the prefix value is a data source identifier and the second value is an identifier for entities within that source. The entity identifiers for a given source are likely to span different numeric or alpha-numeric ranges, but they probably won't be the same ranges across sources. Also, I won't know all those ranges (or sources for that matter) upfront. I'm concerned about the introduction of a new data source (= leading prefix value) since the first writes will be to the same region and ideally I'd be able to get a sense of how the second values are split for the new leading prefix and split an HBase region to reflect that. If that's not possible or just turns out to be a pain, then I can live with the introduction of the new prefix being a little slow until the regions split and distribute effectively. That make sense? On Tue, Sep 4, 2012 at 1:34 PM, Michael Segel michael_se...@hotmail.comwrote: I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until they eventually get balanced out. (Note: I'm not an expert on how hbase balances the regions across a region server so I couldn't tell you how it choses which nodes to place each region.) But what are you trying to do? Avoid a hot spot on the initial load, or are you looking at the longer term picture? On Sep 3, 2012, at 2:58 PM, Eric Czech e...@nextbigsound.com wrote: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari
Re: example - hbase-site.xml - fully distributed
how do to know that the hbase is running correctly? 2012/9/4 Igor Muzetti igormuze...@gmail.com hello! would like an example of the file *hbase-site.xml* configured for a fully distributed. carefully. -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br * http://www.decom.ufop.br/ -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br*http://www.decom.ufop.br/
example - hbase-site.xml - fully distributed
hello! would like an example of the file *hbase-site.xml* configured for a fully distributed. carefully. -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br*http://www.decom.ufop.br/
Re: example - hbase-site.xml - fully distributed
Here is mine. But I can't garanteed that it's correct... configuration property namehbase.rootdir/name valuehdfs://node3:9000/hbase/value descriptionThe directory shared by RegionServers. /description /property property namehbase.cluster.distributed/name valuetrue/value descriptionThe mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) /description /property property namehbase.zookeeper.quorum/name valuecube/value descriptionComma separated list of servers in the ZooKeeper Quorum. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. /description /property property namehbase.zookeeper.property.dataDir/name value/home/zookeeper/value descriptionProperty from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. /description /property /configuration 2012/9/4, Igor Muzetti igormuze...@gmail.com: how do to know that the hbase is running correctly? 2012/9/4 Igor Muzetti igormuze...@gmail.com hello! would like an example of the file *hbase-site.xml* configured for a fully distributed. carefully. -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br * http://www.decom.ufop.br/ -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br*http://www.decom.ufop.br/
Re: example - hbase-site.xml - fully distributed
There are serveral different ways. Running jps as the user that hbase should start as will show you what's running. You should be able to see HMaster or HRegionServer running. If things are running well the master should have a status http server up. Going to that should tell you that things are up and which RegionServer's have checked in. The default port for the MasterServer's status http server is: 60010. Then you can go to http://mymaster:60010/ and look at logs and other stats. Tailing the logs can usually tell you that things are up. Though they are pretty spammy so this is more for debugging If you want a more thorough test LoadTestTool and other utils are included. On Tue, Sep 4, 2012 at 10:45 AM, Igor Muzetti igormuze...@gmail.com wrote: how do to know that the hbase is running correctly? 2012/9/4 Igor Muzetti igormuze...@gmail.com hello! would like an example of the file *hbase-site.xml* configured for a fully distributed. carefully. -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br * http://www.decom.ufop.br/ -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Universitário Morro do Cruzeiro, Ouro Preto - MG, Brazil, 35400-000* * +55 31 3559 1253* *www.terralab.ufop.br* http://www.terralab.ufop.br/ *www.decom.ufop.br*http://www.decom.ufop.br/
Re: Key formats and very low cardinality leading fields
*How does the data flow in to the system? One source at a time?* Generally, it will be one source at a time where these rows are index entries built from MapReduce jobs *The second field. Is it sequential?* No, the index writes from the MapReduce jobs should dump some relatively small number of rows into HBase for each first field - second field combination but then move on to another first field - second field combination where the new second field is not ordered in any way relative to the old second field. *How are you using the data when you pull it from the database?* Not totally sure what specific use cases you might be asking after but in a more general sense, the indexed data will power our web platform (we aggregate and manage data for the music industryhttp://www.crunchbase.com/company/next-big-sound) as well as work as inputs to offline analytics processes. I'm placing the design priority on the interaction with the web platform though, and the full row structure I'm intending to use is: [image: Inline image 2] This is similar to OpenTSDB and the service we provide is similar to what OpenTSDB was designed for, if that gives you a better sense of what I'd like to do with the data. On Tue, Sep 4, 2012 at 2:03 PM, Michael Segel michael_se...@hotmail.comwrote: Eric, So here's the larger question... How does the data flow in to the system? One source at a time? The second field. Is it sequential? If not sequential, is it going to be some sort of incremental larger than a previous value? (Are you always inserting to the left side of the queue? How are you using the data when you pull it from the database? 'Hot spotting' may be unavoidable and depending on other factors, it may be a moot point. On Sep 4, 2012, at 12:56 PM, Eric Czech e...@nextbigsound.com wrote: Longer term .. what's really going to happen is more like I'll have a first field value of 1, 2, and maybe 3. I won't know 4 - 10 for a while and the *second *value after each initial value will be, although highly unique, relatively exclusive for a given first value. This means that even if I didn't use the leading prefix, I'd have more or less the same problem where all the writes are going to the same region when I introduce a new set of second values. In case the generalities are confusing, the prefix value is a data source identifier and the second value is an identifier for entities within that source. The entity identifiers for a given source are likely to span different numeric or alpha-numeric ranges, but they probably won't be the same ranges across sources. Also, I won't know all those ranges (or sources for that matter) upfront. I'm concerned about the introduction of a new data source (= leading prefix value) since the first writes will be to the same region and ideally I'd be able to get a sense of how the second values are split for the new leading prefix and split an HBase region to reflect that. If that's not possible or just turns out to be a pain, then I can live with the introduction of the new prefix being a little slow until the regions split and distribute effectively. That make sense? On Tue, Sep 4, 2012 at 1:34 PM, Michael Segel michael_se...@hotmail.com wrote: I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until they eventually get balanced out. (Note: I'm not an expert on how hbase balances the regions across a region server so I couldn't tell you how it choses which nodes to place each region.) But what are you trying to do? Avoid a hot spot on the initial load, or are you looking at the longer term picture? On Sep 3, 2012, at 2:58 PM, Eric Czech e...@nextbigsound.com wrote: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again,
Re: connection error to remote hbase node
On Sun, Sep 2, 2012 at 6:38 AM, Richard Tang tristartom.t...@gmail.com wrote: Hi, I have a connection problem on setting up hbase on remote node. The ``hbase`` instance is on a machine ``nodeA``. when I am trying to use hbase on ``nodeA`` from another machine (say ``nodeB``), it complains Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused Put up more of your log? Put it up in pastebin so we can see. In your 1., and 2., above, are you connecting from nodeB when doing zkcli and ssh'ing? Why you logging in as root? Is hbase running as root? St.Ack
Re: Reading in parallel from table's regions in MapReduce
On Tue, Sep 4, 2012 at 8:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Your mapreduce job is actually running on the cluster and not in a single thread local (as Jerry hints above). St.Ack
Re: batch update question
On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are the batch update methods we could leverage to resolve the issue? And for API client, what are the batch update methods we could leverage to resolve the issue? Do you actually have a problem where there is contention on a single row? Use methods like http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List) or the batch methods listed earlier in the API. You should set autoflush to false too: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush() Even batching, a highly contended row might hold up inserts... but for sure you actually have this problem in the first place? St.Ack
Re: batch update question
Hi Lin, checkout the slides about high update workloads and HBaseHUT at: http://blog.sematext.com/?s=hbasehut Maybe you could ask Alex Baranau about details here on the list to share it. regards Chris Von: Lin Ma lin...@gmail.com An: user@hbase.apache.org; syrious3...@yahoo.de Gesendet: 17:21 Dienstag, 4.September 2012 Betreff: Re: batch update question Hi Christian, I read through the link you referred. It seems HBaseHUT is exactly the solution I am looking for. Before making the technology choice decision, I want to learn a bit more about its internal design and the general idea of HBaseHUT of how throughput of write is improved. From the discussion, CP is mentioned. But I cannot find more details, appreciate if you could point me to some more detailed documents. Thanks. regards, Lin On Tue, Sep 4, 2012 at 5:28 AM, Christian Schäfer syrious3...@yahoo.de wrote: hi, maybe you could be interrested in hbase hut (high update throughput) see https://github.com/sematext/HBaseHUT -- Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are the batch update methods we could leverage to resolve the issue? And for API client, what are the batch update methods we could leverage to resolve the issue? thanks in advance, Lin
Re: Fixing badly distributed table manually.
Hello, Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the web interface. The port is blocked so I never really got a chance to test it. As far as manual re-balancing is concerned I will check the book. /David On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey guillaume.gar...@mendeley.com wrote: Hello, a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a badly distributed table and how can I re-balance manually? c) Is b) needed at all? I know that HBase does its balancing automatically behind the scenes. I have found that http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/ is a good source of information/tools to look at regions balancing in the cluster and investigate it. As for a) I tried running this script: https://github.com/Mendeley/hbase-scripts/blob/master/list_regions.rb like so: hbase org.jruby.Main ./list_regions.rb _my_table but I get ArgumentError: wrong number of arguments (1 for 2) (root) at ./list_regions.rb:60 If someone more proficient notices an obvious fix, I'd be glad to hear about it. Concerning https://github.com/Mendeley/hbase-scripts , I am afraid that this is a repository that is no longer maintained and was written for old releases of hbase (cdh2 I believe). There's no plan to upgrade it to newer releases. Cheers --- Guillaume
Re: hbase hbck -fixMeta error:RejectException
Is there anymore stack exception information? Also what version is this? Jon. On Mon, Sep 3, 2012 at 7:37 PM, abloz...@gmail.com abloz...@gmail.comwrote: [zhouhh@h185 ~]$ hbase hbck -fixMeta ... Number of Tables: 1731 Number of live region servers: 4 Number of dead region servers: 0 Master: h185,61000,1346659732168 Number of backup masters: 0 Exception in thread main java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsDir@2bfe605crejected from java.util.concurrent.ThreadPoolExecutor@49684e94[Running, pool size = 50, active threads = 49, queued tasks = 0, completed tasks = 7] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2001) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionDirs(HBaseFsck.java:1059) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) any one know how to fix this problem? thanks. Andy -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // j...@cloudera.com
Re: hbase hbck -fixMeta error:RejectException
Looks like you need the fix from HBASE-6018 On Mon, Sep 3, 2012 at 7:37 PM, abloz...@gmail.com abloz...@gmail.comwrote: [zhouhh@h185 ~]$ hbase hbck -fixMeta ... Number of Tables: 1731 Number of live region servers: 4 Number of dead region servers: 0 Master: h185,61000,1346659732168 Number of backup masters: 0 Exception in thread main java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsDir@2bfe605crejected from java.util.concurrent.ThreadPoolExecutor@49684e94[Running, pool size = 50, active threads = 49, queued tasks = 0, completed tasks = 7] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2001) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionDirs(HBaseFsck.java:1059) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:353) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) any one know how to fix this problem? thanks. Andy
Is there a way to replicate root and meta table in HBase?
Hi, We are running into a case that if the region server that serves meta table is down, all request will timeouts because region lookup is not available. At this time, master is also not able to update meta table. It seems that regions that serve root and meta are the single point of failure in HBase. Is there a way to get rid of it? Does HBase give a higher recover priority to meta and root table? Thanks. Gen
Re: Is there a way to replicate root and meta table in HBase?
Just today I saw this mentioned in the docs. They said they deliberately don't replicate those, otherwise it gets very messy. Stas On Tue, Sep 4, 2012 at 10:52 PM, Gen Liu ge...@zynga.com wrote: Hi, We are running into a case that if the region server that serves meta table is down, all request will timeouts because region lookup is not available. At this time, master is also not able to update meta table. It seems that regions that serve root and meta are the single point of failure in HBase. Is there a way to get rid of it? Does HBase give a higher recover priority to meta and root table? Thanks. Gen
Re: Is there a way to replicate root and meta table in HBase?
On Tue, Sep 4, 2012 at 2:52 PM, Gen Liu ge...@zynga.com wrote: We are running into a case that if the region server that serves meta table is down, all request will timeouts because region lookup is not available. Only requests to .META. fail (and most of the time, .META. info is cached so should be relatively rare to do .META. lookups). It should not be all requests. It seems that regions that serve root and meta are the single point of failure in HBase. They can be offline if a server crashes but they should be back on line soon enough; is this not your experience? Is there a way to get rid of it? Does HBase give a higher recover priority to meta and root table? HBase gets .META. and -ROOT- back on line ahead of all other regions, yes. St.Ack
Re: Is there a way to replicate root and meta table in HBase?
On 9/4/12 3:07 PM, Stack st...@duboce.net wrote: On Tue, Sep 4, 2012 at 2:52 PM, Gen Liu ge...@zynga.com wrote: We are running into a case that if the region server that serves meta table is down, all request will timeouts because region lookup is not available. Only requests to .META. fail (and most of the time, .META. info is cached so should be relatively rare to do .META. lookups). It should not be all requests. We get a lot of region lookup error in the client side: (zlive-hbase-08.int.zynga.com is not dead, we killed another server) 2012-09-04 14:28:11,829 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation : locateRegionInMeta parentTable=-ROOT-, metaLocation={region=-ROOT-,,0.70236052, hostname=zlive-hbase-08.int.zynga.com, port=60020}, attempt=8 of 10 failed; retrying after sleep of 16000 because: Connection refused It seems that regions that serve root and meta are the single point of failure in HBase. They can be offline if a server crashes but they should be back on line soon enough; is this not your experience? We set hbase.regionserver.maxlogs=256 to enable big memstore flush to lower compaction stress, so the log split takes about 5-10 minutes. I think META will come back after the log split. Is there a way to specify where should HBase put root and meta table? Is there a way to get rid of it? Does HBase give a higher recover priority to meta and root table? HBase gets .META. and -ROOT- back on line ahead of all other regions, yes. St.Ack
Fwd: Extremely slow when loading small amount of data from HBase
+HBase users. -- Forwarded message -- From: Dmitriy Ryaboy dvrya...@gmail.com Date: 2012/9/4 Subject: Re: Extremely slow when loading small amount of data from HBase To: u...@pig.apache.org u...@pig.apache.org I think the hbase folks recommend something like 40 regions per node per table, but I might be misremembering something. Have you tried emailing the hbase users list? On Sep 4, 2012, at 3:39 AM, 某因幡 tewil...@gmail.com wrote: After merging ~8000 regions to ~4000 on an 8-node cluster the things is getting better. Should I continue merging? 2012/8/29 Dmitriy Ryaboy dvrya...@gmail.com: Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc) On Aug 27, 2012, at 11:49 PM, 某因幡 tewil...@gmail.com wrote: When I load a range of data from HBase simply using row key range in HBaseStorageHandler, I find that the speed is acceptable when I'm trying to load some tens of millions rows or more, while the only map ends up in a timeout when it's some thousands of rows. What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. -- language: Chinese, Japanese, English -- language: Chinese, Japanese, English -- language: Chinese, Japanese, English
Re: Key formats and very low cardinality leading fields
Here's what I don't get -- how is this different than if I allocated a different table for each separate value of the leading field? If I did that and used the second field as the leading prefix instead, I know no one would argue that it's a key that won't distribute well. I don't plan on doing this because it would be too many tables, but I don't really see a fundamental difference between the two approaches. As an experiment though, let's say I did exactly that. In that case, hashing the keys accomplishes virtually nothing since it spreads the values I'm asking for across a range of possibilities that's no larger, more dispersed, or less ordered than if I used the identifiers directly. That make sense? On Tue, Sep 4, 2012 at 11:04 PM, Michael Segel michael_se...@hotmail.comwrote: Uhm... This isn't very good. In terms of inserting, you will hit a single or small subset of regions. This may not be that bad if you have enough data and the rows not all inserting in to the same region. since you're hitting an index to pull rows one at a time, you could do this... if you know the exact record you want, you could hash the key and then you wouldn't have a problem of hot spotting. On Sep 4, 2012, at 1:51 PM, Eric Czech e...@nextbigsound.com wrote: How does the data flow in to the system? One source at a time? Generally, it will be one source at a time where these rows are index entries built from MapReduce jobs The second field. Is it sequential? No, the index writes from the MapReduce jobs should dump some relatively small number of rows into HBase for each first field - second field combination but then move on to another first field - second field combination where the new second field is not ordered in any way relative to the old second field. How are you using the data when you pull it from the database? Not totally sure what specific use cases you might be asking after but in a more general sense, the indexed data will power our web platform (we aggregate and manage data for the music industry) as well as work as inputs to offline analytics processes. I'm placing the design priority on the interaction with the web platform though, and the full row structure I'm intending to use is: This is similar to OpenTSDB and the service we provide is similar to what OpenTSDB was designed for, if that gives you a better sense of what I'd like to do with the data. On Tue, Sep 4, 2012 at 2:03 PM, Michael Segel michael_se...@hotmail.com wrote: Eric, So here's the larger question... How does the data flow in to the system? One source at a time? The second field. Is it sequential? If not sequential, is it going to be some sort of incremental larger than a previous value? (Are you always inserting to the left side of the queue? How are you using the data when you pull it from the database? 'Hot spotting' may be unavoidable and depending on other factors, it may be a moot point. On Sep 4, 2012, at 12:56 PM, Eric Czech e...@nextbigsound.com wrote: Longer term .. what's really going to happen is more like I'll have a first field value of 1, 2, and maybe 3. I won't know 4 - 10 for a while and the *second *value after each initial value will be, although highly unique, relatively exclusive for a given first value. This means that even if I didn't use the leading prefix, I'd have more or less the same problem where all the writes are going to the same region when I introduce a new set of second values. In case the generalities are confusing, the prefix value is a data source identifier and the second value is an identifier for entities within that source. The entity identifiers for a given source are likely to span different numeric or alpha-numeric ranges, but they probably won't be the same ranges across sources. Also, I won't know all those ranges (or sources for that matter) upfront. I'm concerned about the introduction of a new data source (= leading prefix value) since the first writes will be to the same region and ideally I'd be able to get a sense of how the second values are split for the new leading prefix and split an HBase region to reflect that. If that's not possible or just turns out to be a pain, then I can live with the introduction of the new prefix being a little slow until the regions split and distribute effectively. That make sense? On Tue, Sep 4, 2012 at 1:34 PM, Michael Segel michael_se...@hotmail.comwrote: I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until they eventually get balanced out. (Note: I'm not an expert on how hbase balances the