[ 
https://issues.apache.org/jira/browse/HBASE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-8639.
---------------------------

    Resolution: Fixed
    
> very poor performance of htable#getscanner in multithreaded environment
> -----------------------------------------------------------------------
>
>                 Key: HBASE-8639
>                 URL: https://issues.apache.org/jira/browse/HBASE-8639
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.7
>            Reporter: Raymond Liu
>            Assignee: Ted Yu
>             Fix For: 0.98.0, 0.95.2, 0.94.9
>
>         Attachments: 8639-0.94.txt, 8639-v1.txt
>
>
> Hi, I am running a app on top of phoenix which will fork say around 100+ 
> thread to call htable.getscanner(scan) to do parallel scan ( say each scan is 
> actually targeting one Region), And each scan will only match a few result 
> and return thus will be very fast.
> under this case, I found that the htable.getscanner(scan) op itself runs 
> pretty slow. by profiling with jvisualvm. I found 90% of app time is cost on 
> org.apache.hadoop.net.DNS.getDefaultHost. Which been invoked by 
> scannnercallable.checkIfRegionServerIsRemote. 
> The root cause is that DNS.getDefaultHost involves synchronized methods in 
> java.net.Inet4AddressImpl which have the 100+ thread to lock and wait upon 
> each other. each call to DNS.getDefaultHost cost around 30ms, while in 
> another case, I run single thread to call 100K times DNS.getDefaultHost , 
> each cost leas than 0.06ms.
> By hacking the code and remove the call to checkIfRegionServerIsRemote, my 
> app runs 5 times faster, say, 50K op in my app cost 200+ seconds instead of 
> 1000+ seconds.
> by check the code further, I found this checkIfRegionServerIsRemote seems 
> just for use of metrics collection. ( or maybe retry logic?) I am wondering 
> that could this been removed or switch to some other implementation? so that 
> cases like mine which run large number of small scan with multi threads could 
> performance way better?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to