[ 
https://issues.apache.org/jira/browse/HBASE-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShivaKumar SS updated HBASE-20056:
----------------------------------
    Description: 
Currently this method iterates the List of scan objects to get splits and for 
each iteration it opens the HConnection object and closes it, which is heavy.

It can be optimzed such that a single Hconnection can be used to compute all 
the splits of for all the scan objects for their splits computation.

This optimization will help in reducing the launch time for MR Job.



  was:
Currently this method iterates the List of scan objects to get splits and for 
each iteration it opens the HConnection object and closes it, which is heavy.

It can be optimzed such that a single Hconnection can be used to compute all 
the splits of for all the scan objects for their splits computation.

This optimization will help in reducing the launch time for MR Job.

We are using a cluster of 15 nodes, and we have around 120~ scan objects. it 
takes 5~ mins to launch a job. on the optimized code, it takes < 30 ~ secs.


> Performance optimization on MultiTableInputFormatBase#getSplits() 
> ------------------------------------------------------------------
>
>                 Key: HBASE-20056
>                 URL: https://issues.apache.org/jira/browse/HBASE-20056
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase, mapreduce
>    Affects Versions: 1.0.1, 1.3.1
>            Reporter: ShivaKumar SS
>            Priority: Major
>              Labels: hbase, mapreduce, performance
>
> Currently this method iterates the List of scan objects to get splits and for 
> each iteration it opens the HConnection object and closes it, which is heavy.
> It can be optimzed such that a single Hconnection can be used to compute all 
> the splits of for all the scan objects for their splits computation.
> This optimization will help in reducing the launch time for MR Job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to