Ceph as an alternative to HDFS

2010-08-06 Thread Amandeep Khurana
Article published in this months Usenix Login magazine: http://www.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf -ak

How to delete rows in a FIFO manner

2010-08-06 Thread Thomas Downing
Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase. We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the

Re: How to delete rows in a FIFO manner

2010-08-06 Thread Jean-Daniel Cryans
If the inserts are coming from more than 1 client, and your are trying to delete from only 1 client, then likely it won't work. You could try using a pool of deleters (multiple threads that delete rows) that you feed from the scanner. Or you could run a MapReduce that would parallelize that for

Re: How to delete rows in a FIFO manner

2010-08-06 Thread Venkatesh
I wrestled with that idea of time bounded tables..Would it make it harder to write code/run map reduce on multiple tables ? Also, how do u decide to when to do the cut over (start of a new day, week/month..) if u do how to process data that cross those time boundaries efficiently.. Guess that

Re: How to delete rows in a FIFO manner

2010-08-06 Thread Thomas Downing
Our problem does not require significant map/reduce ops, and queries tend to be for sequential rows with the timeframe being the primary consideration. So time-bounded tables are not a big hurdle, as they might be were other columns primary keys or considerations for query or map/reduce ops.

Using HBase's export/import function...

2010-08-06 Thread Michael Segel
Ok, Silly question... Inside the /usr/lib/hbase/*.jar (base jar for HBase) There's an export/import tool. If you supply the #versions, and the start time and end time, you can timebox your scan so your map/reduce job will let you do daily, weekly, etc type of incremental backups. So

Re: Using HBase's export/import function...

2010-08-06 Thread Stack
On Fri, Aug 6, 2010 at 11:13 AM, Michael Segel michael_se...@hotmail.com wrote: 2) There isn't any documentation, I'm assuming that the start time and end times are timestamps (long values representing the number of miliseconds since the epoch which are what is being stored in hbase). Yes.

RE: Using HBase's export/import function...

2010-08-06 Thread Michael Segel
StAck... LOL... The idea is to automate the use of the export function to be run within a cron job. (And yes, there are some use cases where we want to actually back data up.. ;-) I originally wanted to do this in ksh (yeah I'm that old. :-) but ended up looking at Python because I couldn't

Re: HBase storage sizing

2010-08-06 Thread Andrew Nguyen
With respect to the comment below, I'm trying to determine what the minimum IO requirements are for us... For any given value being stored into HBase, is accurate to calculate the size of the row key, family, qualifier, timestamp, and value and use their sum as the amount of data that needs to

Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Stuart Smith
Hello, I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a specific item into the database. Client side I see: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true,

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Ryan Rawson
Hi, When you run into this problem, it's usually a sign of a META problem, specifically you have a 'hole' in the META table. The META table contains a series of keys like so: table,start_row1,timestamp[data] table,start_row2,timestamp[data] etc When we search for a region for a given

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Stuart Smith
Hello Ryan, Yup. There's a hole, exactly where it should be. I used add_table.rb once before, and am no expert on it. All I have is a note written down: To recover lost tables: ./hbase org.jruby.Main add_table.rb /hbase/filestore Any thing else I need to know? Do I just run the script like

Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Stuart Smith
Just to follow up - I ran add_table as I had done when I lost a table before - and it fixed the error. Thanks! Take care, -stu --- On Fri, 8/6/10, Stuart Smith stu24m...@yahoo.com wrote: From: Stuart Smith stu24m...@yahoo.com Subject: Re: Batch puts interrupted ... Requested row out of