On Thu, Mar 25, 2010 at 11:30 PM, Karthik K <[email protected]> wrote:
> > > On Thu, Mar 25, 2010 at 8:03 PM, Andriy Kolyadenko < > [email protected]> wrote: > >> My task is following: I have the list of key ranges and I need to perform >> MR for this ranges as fast as possible. > > > >> As far as I understand MR will do full scan if I will use filter. Is it >> correct? > > > On a given InputSplit, yes. > > But, see HBASE-2302 , where you can inherit from TableInputFormat and > override , a method to reduce the number of InputSplits. > That will significantly reduce the overhead of the bulk scan, and restrict > your filter only to those inputsplits, passing the criteria. > > > * > * > >> >> >> >> --- [email protected] wrote: >> >> From: Stack <[email protected]> >> To: "[email protected]" <[email protected]> >> Subject: Re: Multi ranges Scan >> Date: Thu, 25 Mar 2010 19:57:44 -0700 >> >> Can you use a filter to do this? If no pattern to the excludes then >> it's tougher. How do you know what to exclude? It's in a repository >> somewhere? Add a filter to query this repo? >> >> >> >> On Mar 25, 2010, at 4:07 PM, "Andriy Kolyadenko" < >> [email protected] >> > wrote: >> >> > Ok, it would work for regions pruning. And what about actual rows >> > pruning inside single region? Do you have any ideas how to implement >> > it? >> > >> > --- Stack wrote: --- >> > >> > I think you need to make a custom splitter for your mapreduce job, one >> > that makes splits that align with the ranges you'd have your job run >> > over. A permutation on HBASE-2302 might work for you. >> > Oops. Sorry for the redundant info ! > > >> > St.Ack >> > >> > On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko >> > <[email protected]> wrote: >> >> Hi all, >> >> >> >> maybe somebody could give me advice in the following situation: >> >> >> >> Currently HBase Scan interface provides ability to set up only >> >> first and >> >> last rows for MR scanning. Is it any way to get multiple ranges >> >> into the map >> >> input? >> >> >> >> For example let's assume I have following table: >> >> key value >> >> 1 v1 >> >> 2 v2 >> >> 3 v3 >> >> 4 v4 >> >> 5 v5 >> >> >> >> What I need is to get for example [1,2) and [4,5) ranges as input >> >> for my Map >> >> task. Actually I need this for the performance optimization. >> >> >> >> Any advice? >> >> >> >> Thanks. >> > >> > >> > _____________________________________________________________ >> > Sign up for your free SaturnFans email account at >> http://webmail.saturnfans.com/ >> >> >> >> >> _____________________________________________________________ >> Sign up for your free SaturnFans email account at >> http://webmail.saturnfans.com/ >> > >
