Re: Full table scan cost after deleting Millions of Records from HBase Table

2016-02-09 Thread Billy Watson
If most queries are going to scan the entire table, I'm not sure hbase is the right solution for you. One of the advantages of HBase, in my opinion, is putting data in such a format that you can do skip-scans where lots of data is never read during a particular query. If you're deleting so much

Re: Full table scan from random starting point?

2014-01-31 Thread Jean-Marc Spaggiari
Hi Robert, You can randomly build your start key, give it to your scanner, scan until the end of the table, then give it as the end key for a new scanner. Doing that you will scan the way you are looking for. Also, this might interest you: https://issues.apache.org/jira/browse/HBASE-9272 JM

Re: full table scan

2011-06-20 Thread Andre Reiter
sorry guys, still the same problem... my MR jobs are running not very fast... the job org.apache.hadoop.hbase.mapreduce.RowCounter took 13 minutes to complete while we do not have much rows, just 3223543 at the moment we have 3 region servers, while the table is split over 13 regions on that

Re: full table scan

2011-06-20 Thread Stack
Sounds like you are doing about 5k rows/second per server. What size rows? How many column families? What kinda of hardware? St.Ack On Mon, Jun 20, 2011 at 10:13 PM, Andre Reiter a.rei...@web.de wrote: sorry guys, still  the same problem... my MR jobs are running not very fast... the job

Re: full table scan

2011-06-12 Thread Ted Dunning
He said 10^9. Easy to misread. On Sat, Jun 11, 2011 at 6:41 PM, Stack st...@duboce.net wrote: On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote: so what time can be expected for processing a full scan of i.e. 1.000.000.000 rows in an hbase cluster with i.e. 3 region

Re: full table scan

2011-06-12 Thread Stack
Thanks Ted. I misread On Jun 12, 2011, at 2:31, Ted Dunning tdunn...@maprtech.com wrote: He said 10^9. Easy to misread. On Sat, Jun 11, 2011 at 6:41 PM, Stack st...@duboce.net wrote: On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote: so what time can be expected for

Re: full table scan

2011-06-11 Thread Andre Reiter
Jean-Daniel Cryans wrote: You expect a MapReduce job to be faster than a Scan on small data, your expectation is wrong. never expected a MR job to be faster for every context There's a minimal cost to every MR job, which is of a few seconds, and you can't go around it. for sure there is

Re: full table scan

2011-06-11 Thread Stack
On Sat, Jun 11, 2011 at 1:36 AM, Andre Reiter a.rei...@web.de wrote: so what time can be expected for processing a full scan of i.e. 1.000.000.000 rows in an hbase cluster with i.e. 3 region servers? I don't think three servers and 1M rows (only) enough data and resources for contrast and

Re: full table scan

2011-06-10 Thread Jean-Daniel Cryans
You expect a MapReduce job to be faster than a Scan on small data, your expectation is wrong. There's a minimal cost to every MR job, which is of a few seconds, and you can't go around it. What other people have been trying to tell you is that you don't have enough data to benefit from the

Re: full table scan

2011-06-07 Thread Andre Reiter
now i found out, that there are three regions, each on a particular region server (server2, server3, server4) the processing time is still =60sec, which is not very impressive... what can i do, to speed up the table scan best regards andre Andreas Reiter wrote: hello everybody i'm trying

Re: full table scan

2011-06-07 Thread Stack
See http://hbase.apache.org/book/performance.html St.Ack On Tue, Jun 7, 2011 at 1:08 AM, Andre Reiter a.rei...@web.de wrote: now i found out, that there are three regions, each on a particular region server (server2, server3, server4) the processing time is still =60sec, which is not very

Re: full table scan

2011-06-06 Thread Joey Echeverria
How many regions does your table have? On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter a.rei...@web.de wrote: hello everybody i'm trying to scan my hbase table for reporting purposes the cluster has 4 servers:  - server1: namenode, secondary namenode, jobtracker, hbase master, zookeeper1  -

Re: full table scan

2011-06-06 Thread Christopher Tarnas
How many regions does your table have? If all of the data is still in one region then you will be rate limited by how fast that single region can be read. 3 nodes is also pretty small, the more nodes you have the better (at least 5 for dev and test and 10+ for production has been my experience).

Re: full table scan

2011-06-06 Thread Himanshu Vashishtha
Also, How big is each row? Are you using scanner cache? You just fetching all the rows to the client, and?. 300k is not big (It seems you have 1'ish region, that could explain similar timing). Add more data and mapreduce will pick up! Thanks, Himanshu On Mon, Jun 6, 2011 at 8:59 AM, Christopher

Re: full table scan

2011-06-06 Thread Andre Reiter
: Joey Echeverria Sent: Mon Jun 06 2011 15:10:29 GMT+0200 (CET) To: CC: Subject: Re: full table scan How many regions does your table have?

RE: full table scan

2011-06-06 Thread Doug Meil
Check the web console. -Original Message- From: Andre Reiter [mailto:a.rei...@web.de] Sent: Monday, June 06, 2011 5:27 PM To: user@hbase.apache.org Subject: Re: full table scan good question... i have no idea... i did not define explicitly the number of regions for the table, how can

Re: full table scan

2011-06-06 Thread Andre Reiter
Check the web console. ah, ok thanks! at the port 60010 on the hbase master i actually found a web interface there was only one region, i played i bit with it, and executed the Split function twice. Now i have three regions, one on each hbase region server but still, the processing time did

Re: full table scan

2011-06-06 Thread Ted Yu
I think row counter would help you figure out the number of rows in each region. Refer to the following email thread, especially Stack's answer on Apr 1: row_counter map reduce job 0.90.1 On Mon, Jun 6, 2011 at 3:07 PM, Andre Reiter a.rei...@web.de wrote: Check the web console. ah, ok