Harvesting empty regions

2011-05-31 Thread Arvind Jayaprakash
My setup seems to have a lot of regions with no data that just keep accumulating over time. Here are some details: I have time-series data (created by opentsdb) being inserted into hbase every minute. Since the data has little value after say 15 days, I go ahead and delete all old data. When I

A sudden msg of java.io.IOException: Server not running, aborting

2011-05-31 Thread bijieshan
It occurred in an RegionServer with an un-known reason. I have check this RegionServer logs, there's no prev-aborting, and no other infos showed the RegionServer has aborted. So I saw the following msg showed in a sudden. [logs] 2011-05-25 09:15:44,232 INFO

Re: Harvesting empty regions

2011-05-31 Thread Ferdy Galema
You can use the merge tool to combine adjacent regions. It requires a bit of manual work because you need to specify the regions by hand. The cluster also needs to be offline (I recommend to keep zookeeper running though). Check if merging succeeded with the hbck tool. There are some jira

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
MutipleInputs would be ideal, but that seems pretty complicated. MultiTableInputFormat seems like a simple change in the getSplits() method of TableInputFormat + support for a collection of table and their matching scanners instead of a single table and scanner, doesn't sound too complicated. Any

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Eran, You want to join two tables? The short answer is to use a relational database to solve that problem. Longer answer: You're using HBase so you don't need to think in terms of a reducer. You can create a temp table for your query. You can then run one map job to scan and filter table A,

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Re: The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table. With multi-get in .90.x you could perform some reasonably clever processing and not do the lookups one-by-one but in

RE: How to efficiently join HBase tables?

2011-05-31 Thread Doug Meil
Eran's observation was that a join is solvable in a Mapper via lookups on a 2nd HBase table, but it might not be that efficient if the lookups are 1 by 1. I agree with that. My suggestion was to use multi-Get for the lookups instead. So you'd hold onto a batch of records in the Mapper and

Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Something Something
Hello, Are there scripts available to create a HBase cluster on Rackspace - like there are for Amazon EC2? A quick Google search didn't come up with anything useful. Any help in this regard would be greatly appreciated. Thanks. - Ajay

Re: Harvesting empty regions

2011-05-31 Thread Arvind Jayaprakash
On May 31, Ferdy Galema wrote: You can use the merge tool to combine adjacent regions. It requires a bit of manual work because you need to specify the regions by hand. The cluster also needs to be offline (I recommend to keep zookeeper running though). Check if merging succeeded with the hbck

Re: Harvesting empty regions

2011-05-31 Thread Jean-Daniel Cryans
hbase noob question: do compactions (major/minor) always work in the scope of a region but they don't do region merges? That's what HBASE-1621 is about, merges can't be done while the cluster is running and compactions only happen when hbase is running. J-D

Re: Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Ryan Rawson
Rackspace doesn't have an API, so no. This is one of the primary disadvantages of rackspace, its all hands on/manual. Just boot up your instances and use the standard management tools. On Tue, May 31, 2011 at 10:23 AM, Something Something mailinglist...@gmail.com wrote: Hello, Are there

HFile.Reader scans return latest version?

2011-05-31 Thread Sandy Pratt
Hi all, I'm doing some work to read records directly from the HFiles of a damaged table. When I scan through the records in the HFile using org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest version of the record as with a default HBase Scan? Or do I need to do some

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
Now I'm getting the wrong region exception on the new table that I'm copying the old table to. Running hbck reveals an inconsistency in the new table. The frustration is unbelievable. Like I said before, it doesn't appear that HBase is ready for prime time. I don't know how companies are

Re: A sudden msg of java.io.IOException: Server not running, aborting

2011-05-31 Thread Jean-Daniel Cryans
Can you post the full log somewhere? You talk about several Exceptions but we can't see them. J-D On Tue, May 31, 2011 at 4:41 AM, bijieshan bijies...@huawei.com wrote: It occurred in an RegionServer with an un-known reason. I have check this RegionServer logs,  there's no prev-aborting,  and

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
Thanks everyone for the great feedback. I'll try to address all the suggestions. My data sets go between large and very large. One is in the order of many billions of rows, although the input for a typical MR job will be in the hundreds of millions, the second table is in the tens of millions. I

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
Doesn't Hive for HBase enable joins? On Tue, May 31, 2011 at 5:06 AM, Eran Kutner e...@gigya.com wrote: Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
Doug, I read the OP's post as the following: Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
For my need I don't really need the general case, but even if I did I think it can probably be done simpler. The main problem is getting the data from both tables into the same MR job, without resorting to lookups. So without the theoretical MutliTableInputFormat, I could just copy all the data

Re: wrong region exception

2011-05-31 Thread Stack
Try adding this change: Index: bin/check_meta.rb === --- bin/check_meta.rb (revision 1129468) +++ bin/check_meta.rb (working copy) @@ -127,11 +127,13 @@ scan = Scan.new() scanner = metatable.getScanner(scan) oldHRI = nil -bad

Re: wrong region exception

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 10:42 AM, Robert Gonzalez robert.gonza...@maxpointinteractive.com wrote: Now I'm getting the wrong region exception on the new table that I'm copying the old table to.  Running hbck reveals an inconsistency in the new table.   The frustration is unbelievable.  Like I

Re: HFile.Reader scans return latest version?

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 11:05 AM, Sandy Pratt prat...@adobe.com wrote: Hi all, I'm doing some work to read records directly from the HFiles of a damaged table.  When I scan through the records in the HFile using org.apache.hadoop.hbase.io.hfile.HFileScanner, will I get only the latest

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ted Dunning
Your mapper can tell which file is being read and add source tags to the data records. The reducer can do the cartesian product (if you really need that). On Tue, May 31, 2011 at 12:19 PM, Eran Kutner e...@gigya.com wrote: For my need I don't really need the general case, but even if I did I

Re: about disposing Hbase process

2011-05-31 Thread Stack
Sorry Gao, what is your question? St.Ack 2011/5/31 Gaojinchao gaojinc...@huawei.com: For one our application, There is 3 node. All process disposing and machine configure is as below. Who has experience about this? The use rate of cpu is about 70%~80%, Does it make HBase or zookeeper

RE: How to efficiently join HBase tables?

2011-05-31 Thread Michael Segel
From: doug.m...@explorysmedical.com To: user@hbase.apache.org Date: Tue, 31 May 2011 15:39:14 -0400 Subject: RE: How to efficiently join HBase tables? Re: Didn't see a multi-get... This is what I'm talking about...

Re: How to efficiently join HBase tables?

2011-05-31 Thread Patrick Angeles
On Tue, May 31, 2011 at 3:19 PM, Eran Kutner e...@gigya.com wrote: For my need I don't really need the general case, but even if I did I think it can probably be done simpler. The main problem is getting the data from both tables into the same MR job, without resorting to lookups. So without

RE: HFile.Reader scans return latest version?

2011-05-31 Thread Sandy Pratt
Thanks for the pointers. The damage manifested as scanners skipping over a range in our time series data. We knew from other systems that there should be some records in that region that weren't returned. When we looked closely we saw an extremely improbable jump in rowkeys that should by

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
Yeah, we learned the hard way early last year to follow the guidelines religiously. I've gone over the requirements and checked off everything. We even re-did our tables to only have 4 column families, down from 4x that amount. We are at a loss to find out why we seemed to be cursed when it

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
The script ran without the previous problem, but it did not fix the problem. When I ran hbck or check_meta.rb again they indicated that the problem was still there. Do I need to do something else in preparation before running check_meta? Thanks, Robert -Original Message- From:

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Jack Levin
Hello, is there a git repo URL I could use to check out that code version? -Jack On Thu, May 19, 2011 at 2:35 PM, Stack st...@duboce.net wrote: The Apache HBase team is happy to announce that HBase 0.90.3 is available from the Apache mirror of choice:  

Re: ANN: HBase 0.90.3 available for download

2011-05-31 Thread Andrew Purtell
From: Jack Levin magn...@gmail.com Hello, is there a git repo URL I could use to check out that code version? git://git.apache.org/hbase.git or git://github.com/apache/hbase.git or https://github.com/apache/hbase.git Then checkout tag '0.90.3'

RE: wrong region exception

2011-05-31 Thread Robert Gonzalez
The script doesn't work because it attempts to fix the hole by finding a region in the hdfs filesystem that fills the hole. But in this case there is no such file. The hole is just there. -Original Message- From: Robert Gonzalez [mailto:robert.gonza...@maxpointinteractive.com] Sent:

Re: wrong region exception

2011-05-31 Thread Stack
On Tue, May 31, 2011 at 3:34 PM, Robert Gonzalez robert.gonza...@maxpointinteractive.com wrote: The script doesn't work because it attempts to fix the hole by finding a region in the hdfs filesystem that fills the hole.  But in this case there is no such file.  The hole is just there. OK.

Re: wrong region exception

2011-05-31 Thread Stack
So, what about this new WrongRegionException in the new cluster. Can you figure how it came about? In the new cluster, is there also a hole? Did you start the new cluster fresh or copy from old cluster? St.Ack On Tue, May 31, 2011 at 1:55 PM, Robert Gonzalez

Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
Hello, I am trying to autogen some code off of 90.3. I made some custom additions to our thrift server, however the code that gets generated uses ByteBuffers as opposed to byte[]. How can I get around having to manually add to the autogen code to match? Is there a thrift flag or different

Re: How to efficiently join HBase tables?

2011-05-31 Thread Jason Rutherglen
The Hive-HBase integration allows you to create Hive tables that are backed by HBase In addition, HBase can be made to go faster for MapReduce jobs, if the HFile's could be used directly in HDFS, rather than proxying through the RegionServer. I'd imagine that join operations do not require

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
This may help: http://download.oracle.com/javase/1,5.0/docs/api/java/nio/ByteBuffer.html#array() http://download.oracle.com/javase/1,5.0/docs/api/java/nio/ByteBuffer.html#array()What is it you are actually trying to do? On Tue, May 31, 2011 at 5:14 PM, Matthew Ward m...@imageshack.net wrote:

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
The issue I am encountering is that the code generated doing 'thrift --gen java Hbase.thrift' outputs code utilizing the 'ByteBuffer' type instead of 'bytes[]'. All the code in org.apache.hadoop.hbase.thrift utilizes byte[]. So basically the code generated via thrift is incompatible with the

Re: How to efficiently join HBase tables?

2011-05-31 Thread Bill Graham
We use Pig to join HBase tables using HBaseStorage which has worked well. If you're using HBase = 0.89 you'll need to build from the trunk or the Pig 0.8 branch. On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The Hive-HBase integration allows you to

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
Which versions of thrift are involved here? This sounds like a Thrift version mismatch. What does [thrift -version] say? What is the hbase dependency? On Tue, May 31, 2011 at 5:32 PM, Matthew Ward m...@imageshack.net wrote: The issue I am encountering is that the code generated doing 'thrift

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
$ thrift -version Thrift version 0.6.0 Not sure about the Hbase Dependency. On May 31, 2011, at 5:45 PM, Ted Dunning wrote: Which versions of thrift are involved here? This sounds like a Thrift version mismatch. What does [thrift -version] say? What is the hbase dependency? On Tue,

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
Yes. You have a version problem with Thrift. From the 0.6.0 release notes for Thrift: THRIFT-830 Java Switch binary field implementation from byte[] to ByteBuffer (Bryan Duxbury) If you look at THRIFT-830 https://issues.apache.org/jira/browse/THRIFT-830 you will see the trenchant

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Ted Dunning
thrift.version0.5.0/thrift.version!-- newer version available -- On Tue, May 31, 2011 at 5:54 PM, Matthew Ward m...@imageshack.net wrote: $ thrift -version Thrift version 0.6.0 Not sure about the Hbase Dependency. On May 31, 2011, at 5:45 PM, Ted Dunning wrote: Which versions of

Re: Thrift Autogen: byte[] vs ByteBuffer

2011-05-31 Thread Matthew Ward
Good catch! Thanks. On May 31, 2011, at 5:55 PM, Ted Dunning wrote: thrift.version0.5.0/thrift.version!-- newer version available -- On Tue, May 31, 2011 at 5:54 PM, Matthew Ward m...@imageshack.net wrote: $ thrift -version Thrift version 0.6.0 Not sure about the Hbase

re: about disposing Hbase process

2011-05-31 Thread Gaojinchao
Per I know: 1.zookeeper is sensitive to resources(Memory, Disk, CPU, NetWork). If there is some underprovisioning on server, then a) Server may not respond to client requests in time. b) Client assumes server is down, closes the socket and it connects to other server. 2. Hbase is sensitive to

Re: Regions count is not consistant between the WEBUI and LoaderBalancer

2011-05-31 Thread bijieshan
Sorry for a long time break of the discussion about this problem. Till now, I found one possible reason cause this problem. The main reason of this problem is the splitted region could be online again. The following is my anylysis: (The cluster has two HMatser, one active and one standby)

Re: How to improve HBase throughput with YCSB?

2011-05-31 Thread Ted Dunning
Woof. Of course. Harold, You appear to be running on about 10 disks total. Each disk should be capable of about 100 ops per second but they appear to be doing about 70. This is plausible overhead. Try attaching 5 or 10 small EBS partitions to each of your nodes and use them in HDFS. That