[jira] [Updated] (HBASE-20056) Performance optimization on MultiTableInputFormatBase#getSplits()

ShivaKumar SS (JIRA) Fri, 23 Feb 2018 04:35:07 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ShivaKumar SS updated HBASE-20056:
----------------------------------
    Affects Version/s: 1.3.1

> Performance optimization on MultiTableInputFormatBase#getSplits() 
> ------------------------------------------------------------------
>
>                 Key: HBASE-20056
>                 URL: https://issues.apache.org/jira/browse/HBASE-20056
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase, mapreduce
>    Affects Versions: 1.0.1, 1.3.1
>            Reporter: ShivaKumar SS
>            Priority: Major
>              Labels: hbase, mapreduce, performance
>
> Currently this method iterates the List of scan objects to get splits and for 
> each iteration it opens the HConnection object and closes it, which is heavy.
> It can be optimzed such that a single Hconnection can be used to compute all 
> the splits of for all the scan objects for their splits computation.
> This optimization will help in reducing the launch time for MR Job.
> We are using a cluster of 15 nodes, and we have around 120~ scan objects. it 
> takes 5~ mins to launch a job. on the optimized code, it takes < 30 ~ secs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20056) Performance optimization on MultiTableInputFormatBase#getSplits()

Reply via email to