Re: hbase hashing algorithm and schema design

2011-06-08 Thread tsuna
On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned selek...@yahoo.com wrote: I was studying the OpenTSDB example, where they also prefix the row keys with event id. I further modified my row keys to have this - eventid uuid  -mm-dd The uuid is fairly unique and random. Is appending a uuid to

RE: distribution of regions to servers

2011-06-08 Thread Kleegrewe, Christian
Hi geoff, Since hbase balances not at table but at cluster basis it may happen that all the regions for one table are located at the same region server. The reason for this may be the way hbase does table splits. If a region exceeds the configured maximum size the region is split into two, but

Hbase hbck showing status as INCONSISTENT

2011-06-08 Thread praveenesh kumar
Hello guys, Well.. I am using 12 node hbase cluster. I can see all the nodes running on Hbase Web-UI. But when I am running hbase hbck , I am getting the following output : hadoop@ub13:/usr/local/hadoop$ hbase hbck 11/06/08 15:30:52 INFO zookeeper.ZooKeeper: Client

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
Re: With respect to Doug's posts, you can't do a multi-get off the bat That's an assumption, but you're entitled to your opinion. -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Monday, June 06, 2011 10:08 PM To: user@hbase.apache.org Subject: RE: How to

Re: tech. talk at imageshack/yfrog

2011-06-08 Thread Matt Davies
If it is possible I think any slides or even a video would be very interesting to some of us that can't travel. I, for one, would love to hear how you do it. Thanks! On Tue, Jun 7, 2011 at 6:07 PM, Jack Levin magn...@gmail.com wrote: Hey Guys, I plan to do a tech talk here at ImageShack, on

Re: tech. talk at imageshack/yfrog

2011-06-08 Thread Himanshu Vashishtha
+1 to Matt's opinion (if possible?). I am interested in your use case, sounds very impressive by the stats you gave. You said 1000 tables? Looking forward to see what optimizations/config tweaks you had to do to cope up with your read/write requirements. Thanks, Himanshu On Wed, Jun 8, 2011 at

Re: Hbase Hardware requirement

2011-06-08 Thread Andrew Purtell
From: Ted Dunning tdunn...@maprtech.com Lots of people are moving towards more spindles per box to increase IOP/s This is particular important for cases where the working set gets pushed out of memory. Indeed. Our spec is more like 12x 500 GB SATA disks, to push IOPS and more evenly

in-memory data grid vs. ehcache + hbase

2011-06-08 Thread Hiller, Dean x66079
We have certain tables with under 10 rows, one under 200 rows and one with 1,000,000 rows. We have found out that having a copy/cache on each node is EXTREMELY fast for our batch processing since these copies of data are local AND in-memory. The issue I am struggling with is the best way to

How to find encoded name for a region?

2011-06-08 Thread James Hammerton
Hi, Given the tableName, startKey and endKey for a region how do I get hold of the encodedName? We have code for identifying overlapping regions that outputs triples of the form tableName, startKey and endKey for each region, but it looks like the Merge command (we're using 0.20.6) requires the

Hadoop/HBase Upgrade Suggestion

2011-06-08 Thread Zhong, Sheng
Hey, Could anyone give me suggestion for Hadoop/HBase upgrade? We're currently using apache hadoop 0.20.2 + hbase 0.20.3 + zookeeper-3.2.2. Has anyone done with latest stable version of hadoop-0.20.203.0rc1 + Hbase 0.90.2, and will Hbase 0.90.2 have compatible issue with hadoop-0.20.203.0rc1?

Re: How to find encoded name for a region?

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 9:22 AM, James Hammerton james.hammer...@mendeley.com wrote: Given the tableName, startKey and endKey for a region how do I get hold of the encodedName? I suppose it depends on the context. If reading .META., then if you deserialize the info:regioninfo into an

Re: How to find encoded name for a region?

2011-06-08 Thread James Hammerton
Thanks, Stack. The context is that we have a script, find_overlapping_regions.rb at: https://github.com/Mendeley/hbase-scripts/blob/master/find_overlapping_regions.rb We'd ideally like to feed the results into another script (to be written) that will call org.apache.hbase.util.Merge. I've been

Re: How to find encoded name for a region?

2011-06-08 Thread Stack
Do you have check_meta.rb in 0.20.6 (I don't remember? I think you do). Start with that? Otherwise, here: keys = wanted_table.getStartEndKeys In 0.20.6 can you get HRegionInfos instead of start keys? That'd be more useful. They would have the encoded name. We'd ideally like to feed the

Re: How to find encoded name for a region?

2011-06-08 Thread James Hammerton
Hi, I've checked /usr/lib/hbase/bin and it doesn't have check_meta.rb. Also, HTable doesn't have getHRegionInfos in 0.20.6. Regards, James On Wed, Jun 8, 2011 at 5:46 PM, Stack st...@duboce.net wrote: Do you have check_meta.rb in 0.20.6 (I don't remember? I think you do). Start with

Re: Hadoop/HBase Upgrade Suggestion

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 9:25 AM, Zhong, Sheng sheng.zh...@searshc.com wrote: Could anyone give me suggestion for Hadoop/HBase upgrade? We're currently using  apache hadoop 0.20.2 + hbase 0.20.3 + zookeeper-3.2.2. Has anyone done with latest stable version of hadoop-0.20.203.0rc1 + Hbase 0.90.2,

Re: How to find encoded name for a region?

2011-06-08 Thread Stack
Pull it in. You'll have to massage a little but rather than do the indirect HTable.getStartKeys (which turns around and reads meta), read .META. directly and get the HRIs yourself. St.Ack On Wed, Jun 8, 2011 at 9:51 AM, James Hammerton james.hammer...@mendeley.com wrote: Hi, I've checked

Re: How to find encoded name for a region?

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 10:01 AM, James Hammerton james.hammer...@mendeley.com wrote: Thanks Stack, I take it you mean get hold of check_meta.rb from a recent version and alter it to find the HRIs? Yes. Alter it to run in 0.20.6. St.Ack

Re: Hbase hbck showing status as INCONSISTENT

2011-06-08 Thread Stack
A problem that will be fixed in 0.90.4 is that once hbck finds one issue, all checks that follow emit 'INCONSISTENCY'. A quick perusal of the below has it that hbck is not able to reach a server. Can you check into that? Its using an IP, rather than hostname. Why is that? ips in the

HBase Backups

2011-06-08 Thread Manoj Murumkar
Hi, We're trying to come up with right strategy for backing up HBase tables. Assumption is that sizes of tables will not grow beyond few hundred GB. Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data). Are

RE: EXT :Re: Failure to Launch: hbase-0.90.3 with hadoop-0.20.203.0

2011-06-08 Thread Ratner, Alan S (IS)
J-D, Thanks for the info. I copied the appropriate hadoop jar file to the lib directory (and renamed the original one). I wasn't able to figure out why zookeeper wasn't running on my master server so I launched zookeeper directly and set HBASE_MANAGES_ZK to false. (And since I am running

Re: EXT :Re: Failure to Launch: hbase-0.90.3 with hadoop-0.20.203.0

2011-06-08 Thread Stack
Looks like you need to copy to hbase a commons config jar; this version of hadoop seems to depend on it: java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration And you are clear that this version of hadoop does not have sync/append so hbase will lose data on crash. St.Ack

Re: in-memory data grid vs. ehcache + hbase

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 9:00 AM, Hiller, Dean x66079 dean.hil...@broadridge.com wrote: We have certain tables with under 10 rows, one under 200 rows and one with 1,000,000 rows.  We have found out that having a copy/cache on each node is EXTREMELY fast for our batch processing since these

Re: What the optimization method of when to delete Zk connection?

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 1:44 AM, bijieshan bijies...@huawei.com wrote: Thanks Suraj. Yes, It's a better method. For I haven't test on that. So use HTablePool, it seems we haven't need to delete Zk connections manually? Is that correct? Yes. St.Ack

Re: How to efficiently join HBase tables?

2011-06-08 Thread Eran Kutner
I'd like to clarify, again what I'm trying to do and why I still think it's the best way to do it. I want to join two large tables, I'm assuming, and this is the key to the efficiency of this method, that: 1) I'm getting a lot of data from table A, something which is close enough top a full table

Re: Delete whole table HBase

2011-06-08 Thread Azshara
Yes, thanks it worked! Have no idea how I didn't come across the method! Thank you for the tip!

Re: distribution of regions to servers

2011-06-08 Thread Ted Yu
In trunk this behavior has been improved. Load balancer would move the youngest region off heavily loaded region server. See http://zhihongyu.blogspot.com/2011/04/load-balancer-in-hbase-090.html I am thinking of creating a new policy for region assignment at cluster startup which assigns regions

Re: distribution of regions to servers

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 12:50 PM, Ted Yu yuzhih...@gmail.com wrote: I am thinking of creating a new policy for region assignment at cluster startup which assigns regions from each table in round-robin fashion. Don't we want to retain assignments on startup since that will ensure greatest

RE: How to efficiently join HBase tables?

2011-06-08 Thread Buttler, David
Let's make a toy example to see if we can capture all of the edge conditions: Table A --- Key1 joinVal_1 Key2 joinVal_2 Key3 joinVal_1 Table B --- Key4 joinVal_1 Key5 joinVal_3 Key6 joinVal_2 Now, assume that we have a mapper that takes two values, one row from A, and one row from B.

Re: hbase hashing algorithm and schema design

2011-06-08 Thread Sam Seigal
On Wed, Jun 8, 2011 at 12:40 AM, tsuna tsuna...@gmail.com wrote: On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned selek...@yahoo.com wrote: I was studying the OpenTSDB example, where they also prefix the row keys with event id. I further modified my row keys to have this - eventid uuid

RE: distribution of regions to servers

2011-06-08 Thread Doug Meil
If I understand the history correctly, round-robin was used in .89, but retains is the policy for .90+. My 2-cents is that if/when region-shuffling is required, I'd rather do that with another utility and keep that out of cluster startup. -Original Message- From: saint@gmail.com

Re: How to efficiently join HBase tables?

2011-06-08 Thread Michel Segel
Unless I am mistaken... get() requires a row key, right? And you can join tables on column data which isn't in the row key, right? So how do you do a get()? :-) Sure there is more than one way to skin a cat. But if you want to be efficient... You will create a set of unique keys based on the

Re: How to efficiently join HBase tables?

2011-06-08 Thread Dave Latham
I believe this is what Eran is suggesting: Table A --- Row1 (has joinVal_1) Row2 (has joinVal_2) Row3 (has joinVal_1) Table B --- Row4 (has joinVal_1) Row5 (has joinVal_3) Row6 (has joinVal_2) Mapper receives a list of input rows (union of both input tables in any order), and produces

Re: HBase Backups

2011-06-08 Thread Joey Echeverria
Can you afford some down time? If so, you could minor compact, disable the table, distcp, and then enable the table. -Joey On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar manoj.murum...@gmail.com wrote: Hi, We're trying to come up with right strategy for backing up HBase tables. Assumption is

RE: How to efficiently join HBase tables?

2011-06-08 Thread Buttler, David
Thank you for the explanation, I think I understand the suggestion now. I completely agree with you that this would be effective for cases that you can do the join of the sorted values in memory. A small tweak would make this more generic and effective for any size. If you had two separate

increasing hbase get latencies

2011-06-08 Thread Abhijit Pol
We are on hbase 0.90 and using hbase for a while to perform high volume data lookup using hbase client (no map-reduce involved). Recently we observed that our get latencies keep increasing over the period (and eventually flatten out at higher value) and if we restart hbase server, latencies go

0.92.0 availability

2011-06-08 Thread Ma, Ming
Hi, Where can I find the targeted release date of 0.92.0? Thanks. Ming

Re: HBase Backups

2011-06-08 Thread Manoj Murumkar
We are trying to do this online as downtime is not an option. Good point, nonetheless. On Jun 8, 2011 3:48 PM, Joey Echeverria j...@cloudera.com wrote: Can you afford some down time? If so, you could minor compact, disable the table, distcp, and then enable the table. -Joey On Wed, Jun 8,

Re: HBase Backups

2011-06-08 Thread Otis Gospodnetic
There is this post about HBase backup options http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it helps. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Manoj

Re: 0.92.0 availability

2011-06-08 Thread Otis Gospodnetic
I wouldn't rely on any dates. :) I'd look at the number of remaining open JIRA issues with that target version. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Ma, Ming

Re: Best practices for HBase in EC2?

2011-06-08 Thread George P. Stathis
Jim, I'd be interested in hearing your experience with Whirr when you try it. I've been testing it the last couple of days and I haven't been able to get the out-of-the box hadoop recipe to work when it cames up (the namenode doesn't have any datanodes configured although they are all up and

Re: distribution of regions to servers

2011-06-08 Thread Ted Yu
The assumption was that regions were not evenly distributed prior to restarting. If they were, user wouldn't select this policy. We can this policy effective only once - retain assignment is selected following this new policy. Of course the dynamic portion of load balancer needs to select the

RE: How to efficiently join HBase tables?

2011-06-08 Thread Doug Meil
Hi there- Summary comment: 1) Preference Several people in this thread have suggested approaches (map-side memory join, multi-get, temp files), all of which have merit and have advantages in certain situations. Kudos to the dist-list for chiming in. The right approach depends on the

a question about log level

2011-06-08 Thread Gaojinchao
How should we set the log level for production ? Do anyone have some experience? I want to use information.

Re: HBase Backups

2011-06-08 Thread Manoj Murumkar
Thanks, I have seen it. Once I verify a viable solution, I will update this thread. On Jun 8, 2011 5:57 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: There is this post about HBase backup options http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it helps. Otis

Re: a question about log level

2011-06-08 Thread Stack
In the conf/log4j.properties St.Ack On Wed, Jun 8, 2011 at 9:02 PM, Gaojinchao gaojinc...@huawei.com wrote: How should we set the log level for production ? Do anyone have some experience? I want to use information.

Re: Hbase hbck showing status as INCONSISTENT

2011-06-08 Thread praveenesh kumar
Hi.. I guess the problem is one of my regionserver is having entry of localhost in /etc/hosts file. My log is saying that *2011-06-08 15:24:27,588 INFO org.apache.hadoop.hbase.* *regionserver.HRegionServer: Serving as ub8,60020,1307526863668, RPC listening on /127.0.0.1:60020,

HQuorum failures

2011-06-08 Thread James Ram
Hi, We are running a 5 machine Hbase cluster. We have noticed that whenever an HQuorum fails in one machine, the entire application that is running on HBase crashes. Is there anything to do about this? -- With Regards, Jr.

Adding HQuorum dynamically.

2011-06-08 Thread James Ram
Is there anyway to add a new HQuorum to the cluster dynamically? -- With Regards, Jr.

Re: HQuorum failures

2011-06-08 Thread Chris Tarnas
What is an HQuorum? If you mean a regionserver then possibly you application is attempting to get data that was on a region hosted by the failed regionserver and in that case you need to make sure you application can deal the connection failure and wait for the the regions to be reassigned to

Re: HQuorum failures

2011-06-08 Thread James Ram
Hi, Thanks for your reply. So does HBase automatically reassign to another regionserver or do we have to do it manually. On Thu, Jun 9, 2011 at 10:18 AM, Chris Tarnas c...@email.com wrote: What is an HQuorum? If you mean a regionserver then possibly you application is attempting to get data

Re: HQuorum failures

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 10:10 PM, James Ram hbas...@gmail.com wrote: Hi, Thanks for your reply. So does HBase automatically reassign to another regionserver or do we have to do it manually. It does it automatically. St.Ack

Re: Adding HQuorum dynamically.

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 9:45 PM, James Ram hbas...@gmail.com wrote: Is there anyway to add a new HQuorum to the cluster dynamically? If HQuorum == HRegionServer, then yes. Just make sure it has same config. as other members of the cluster and start it. St.Ack

Re: Hbase hbck showing status as INCONSISTENT

2011-06-08 Thread Stack
On Wed, Jun 8, 2011 at 9:37 PM, praveenesh kumar praveen...@gmail.com wrote: But my problem is I want to keep the entry of localhost in my /etc/hosts file.. Is there any parameter that we can put in hbase-site.xml so that RPC starts listening on regionserver's actual IP rather than default