Article published in this months Usenix Login magazine:
http://www.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf
-ak
Hi,
Continuing with testing HBase suitability in a high ingest rate
environment, I've come up with a new stumbling block, likely
due to my inexperience with HBase.
We want to keep and purge records on a time basis: i.e, when
a record is older than say, 24 hours, we want to purge it from
the
If the inserts are coming from more than 1 client, and your are trying
to delete from only 1 client, then likely it won't work. You could try
using a pool of deleters (multiple threads that delete rows) that you
feed from the scanner. Or you could run a MapReduce that would
parallelize that for
I wrestled with that idea of time bounded tables..Would it make it harder to
write code/run map reduce
on multiple tables ? Also, how do u decide to when to do the cut over (start of
a new day, week/month..)
if u do how to process data that cross those time boundaries efficiently..
Guess that
Our problem does not require significant map/reduce ops, and
queries tend to be for sequential rows with the timeframe being
the primary consideration. So time-bounded tables are not a
big hurdle, as they might be were other columns primary keys
or considerations for query or map/reduce ops.
Ok,
Silly question...
Inside the /usr/lib/hbase/*.jar (base jar for HBase) There's an export/import
tool.
If you supply the #versions, and the start time and end time, you can timebox
your scan so your map/reduce job will let you do daily, weekly, etc type of
incremental backups.
So
On Fri, Aug 6, 2010 at 11:13 AM, Michael Segel
michael_se...@hotmail.com wrote:
2) There isn't any documentation, I'm assuming that the start time and end
times are timestamps (long values representing the number of miliseconds
since the epoch which are what is being stored in hbase).
Yes.
StAck...
LOL...
The idea is to automate the use of the export function to be run within a cron
job.
(And yes, there are some use cases where we want to actually back data up.. ;-)
I originally wanted to do this in ksh (yeah I'm that old. :-) but ended up
looking at Python because I couldn't
With respect to the comment below, I'm trying to determine what the minimum IO
requirements are for us...
For any given value being stored into HBase, is accurate to calculate the size
of the row key, family, qualifier, timestamp, and value and use their sum as
the amount of data that needs to
Hello,
I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to
insert a specific item into the database.
Client side I see:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=true,
Hi,
When you run into this problem, it's usually a sign of a META problem,
specifically you have a 'hole' in the META table.
The META table contains a series of keys like so:
table,start_row1,timestamp[data]
table,start_row2,timestamp[data]
etc
When we search for a region for a given
Hello Ryan,
Yup. There's a hole, exactly where it should be.
I used add_table.rb once before, and am no expert on it.
All I have is a note written down:
To recover lost tables:
./hbase org.jruby.Main add_table.rb /hbase/filestore
Any thing else I need to know? Do I just run the script like
Just to follow up - I ran add_table as I had done when I lost a table before -
and it fixed the error.
Thanks!
Take care,
-stu
--- On Fri, 8/6/10, Stuart Smith stu24m...@yahoo.com wrote:
From: Stuart Smith stu24m...@yahoo.com
Subject: Re: Batch puts interrupted ... Requested row out of
13 matches
Mail list logo