If most queries are going to scan the entire table, I'm not sure hbase is
the right solution for you. One of the advantages of HBase, in my opinion,
is putting data in such a format that you can do skip-scans where lots of
data is never read during a particular query.
If you're deleting so much
Hi Robert,
You can randomly build your start key, give it to your scanner, scan until
the end of the table, then give it as the end key for a new scanner. Doing
that you will scan the way you are looking for.
Also, this might interest you:
https://issues.apache.org/jira/browse/HBASE-9272
JM
sorry guys,
still the same problem... my MR jobs are running not very fast...
the job org.apache.hadoop.hbase.mapreduce.RowCounter took 13 minutes to
complete while we do not have much rows, just 3223543
at the moment we have 3 region servers, while the table is split over 13
regions on that
Sounds like you are doing about 5k rows/second per server.
What size rows? How many column families? What kinda of hardware?
St.Ack
On Mon, Jun 20, 2011 at 10:13 PM, Andre Reiter a.rei...@web.de wrote:
sorry guys,
still the same problem... my MR jobs are running not very fast...
the job
He said 10^9. Easy to misread.
On Sat, Jun 11, 2011 at 6:41 PM, Stack st...@duboce.net wrote:
On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote:
so what time can be expected for processing a full scan of i.e.
1.000.000.000 rows in an hbase cluster with i.e. 3 region
Thanks Ted. I misread
On Jun 12, 2011, at 2:31, Ted Dunning tdunn...@maprtech.com wrote:
He said 10^9. Easy to misread.
On Sat, Jun 11, 2011 at 6:41 PM, Stack st...@duboce.net wrote:
On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote:
so what time can be expected for
Jean-Daniel Cryans wrote:
You expect a MapReduce job to be faster than a Scan on small data,
your expectation is wrong.
never expected a MR job to be faster for every context
There's a minimal cost to every MR job, which is of a few seconds, and
you can't go around it.
for sure there is
On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote:
so what time can be expected for processing a full scan of i.e.
1.000.000.000 rows in an hbase cluster with i.e. 3 region servers?
I don't think three servers and 1M rows (only) enough data and
resources for contrast and
You expect a MapReduce job to be faster than a Scan on small data,
your expectation is wrong.
There's a minimal cost to every MR job, which is of a few seconds, and
you can't go around it.
What other people have been trying to tell you is that you don't have
enough data to benefit from the
now i found out, that there are three regions, each on a particular region
server (server2, server3, server4)
the processing time is still =60sec, which is not very impressive...
what can i do, to speed up the table scan
best regards
andre
Andreas Reiter wrote:
hello everybody
i'm trying
See http://hbase.apache.org/book/performance.html
St.Ack
On Tue, Jun 7, 2011 at 1:08 AM, Andre Reiter a.rei...@web.de wrote:
now i found out, that there are three regions, each on a particular region
server (server2, server3, server4)
the processing time is still =60sec, which is not very
How many regions does your table have?
On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter a.rei...@web.de wrote:
hello everybody
i'm trying to scan my hbase table for reporting purposes
the cluster has 4 servers:
- server1: namenode, secondary namenode, jobtracker, hbase master,
zookeeper1
-
How many regions does your table have? If all of the data is still in one
region then you will be rate limited by how fast that single region can be
read. 3 nodes is also pretty small, the more nodes you have the better (at
least 5 for dev and test and 10+ for production has been my experience).
Also,
How big is each row? Are you using scanner cache? You just fetching all the
rows to the client, and?.
300k is not big (It seems you have 1'ish region, that could explain similar
timing). Add more data and mapreduce will pick up!
Thanks,
Himanshu
On Mon, Jun 6, 2011 at 8:59 AM, Christopher
: Joey Echeverria
Sent: Mon Jun 06 2011 15:10:29 GMT+0200 (CET)
To:
CC:
Subject: Re: full table scan
How many regions does your table have?
Check the web console.
-Original Message-
From: Andre Reiter [mailto:a.rei...@web.de]
Sent: Monday, June 06, 2011 5:27 PM
To: user@hbase.apache.org
Subject: Re: full table scan
good question... i have no idea...
i did not define explicitly the number of regions for the table, how can
Check the web console.
ah, ok thanks!
at the port 60010 on the hbase master i actually found a web interface
there was only one region, i played i bit with it, and executed the Split
function twice. Now i have three regions, one on each hbase region server
but still, the processing time did
I think row counter would help you figure out the number of rows in each
region.
Refer to the following email thread, especially Stack's answer on Apr 1:
row_counter map reduce job 0.90.1
On Mon, Jun 6, 2011 at 3:07 PM, Andre Reiter a.rei...@web.de wrote:
Check the web console.
ah, ok
18 matches
Mail list logo