Re: A proposal for Provide key range support to bulkload to avoid too many reducers (HBASE-9556)

beeshma r Sat, 05 Mar 2016 09:42:58 -0800

HI Ted ,

Regarding for this  Fix  HBASE-9556 .while I testing with pre- split  table
i.e


*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]* =>it should  create 3 regions.

So for this case i created logic to find start keys of regions.

HTable ht=new HTable(con,"test"); // Table object
NavigableMap<HRegionInfo,ServerName> np=ht.getRegionLocations();                
        
Set<HRegionInfo> setinfo=np.keySet();
List<HRegionInfo> lis=new ArrayList<HRegionInfo>();
lis.addAll(setinfo);
for(org.apache.hadoop.hbase.HRegionInfo h :lis)
                {
                        System.out.println(h.getRegionId() + "getRegionId");
                        
                        String s = new String(h.getStartKey());

                        System.out.println(s.toString()+"-------start key");
                }

As per above code logic i got 4 regions( 4 RegionId's) One is with
empty start key and end key remaining start keys are started like
a,b,c as respective regions

My question are

1.How many Region the below command will create?
*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]*

2.To find exact number for regions can i use RegionID counts?


cheers

Beeshma



On Thu, Jul 30, 2015 at 9:57 AM, Ted Yu <[email protected]> wrote:

> The following API doesn't contain start / end keys:
> List<InputSplit> getSplits(JobContext context)
>
> You need to pass key range information.
>
> I suggest continue discussion on the JIRA.
>
> Cheers
>
> On Thu, Jul 30, 2015 at 9:50 AM, beeshma r <[email protected]> wrote:
>
> > HI,
> >
> > i'd like work with key range support to bulkload to avoid too many
> reducers
> > mentioned in with these issues (HBASE-9556,HBASE-4063)
> >
> > Description and high level design for  proposed solution
> >
> > Currently while we loading bulk data in to Hbase through Mapredue in form
> > of TableInputFormatBase the number of splits matches the number of
> regions
> > in a table
> > so Here i am going to change the process TableInputFormatBase deceides
> > range for key splits
> >  For example if input data going to load data in 50 regions(Actullay RS
> has
> > 400 Regions)
> >
> >    - List<InputSplit> getSplits(JobContext context) will find  50 exact
> >    list of splits (Currently it returns 400 )
> >
> >
> > Am i understand correctly? please let me know if Am I on the wrong track
> > .Any one is willing to mentor me because i am new to ASF
> >
> > Thanks
> > Beeshma
> >
>



--

Re: A proposal for Provide key range support to bulkload to avoid too many reducers (HBASE-9556)

Reply via email to