[ https://issues.apache.org/jira/browse/NUTCH-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2949. ------------------------------------ Assignee: Sebastian Nagel Resolution: Fixed Fixed via NUTCH-2936 / [PR#733|https://github.com/apache/nutch/pull/733]. > Tasks of a multi-threaded map runner may fail because of slow creation of URL > stream handlers > --------------------------------------------------------------------------------------------- > > Key: NUTCH-2949 > URL: https://issues.apache.org/jira/browse/NUTCH-2949 > Project: Nutch > Issue Type: Bug > Components: net, plugin, protocol > Affects Versions: 1.19 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Blocker > Fix For: 1.19 > > > While running a custom Nutch job ([code > here|https://github.com/commoncrawl/nutch/blob/cc/src/java/org/apache/nutch/crawl/SitemapInjector.java]), > many but not all task failed exceeding the the Hadoop task time-out > (`mapreduce.task.timeout`) without generating any "heartbeat" (output, > counter increments, log messages). Hadoop logs the stacks of all threads of > the timed out task. That's the base for the excerpts below. > The job runs a MultithreadedMapper - most of the mapper threads (48 in total) > are waiting for the URLStreamHandler in order to construct a java.net.URL > object: > {noformat} > "Thread-11" #27 prio=5 os_prio=0 cpu=243.78ms elapsed=647.25s > tid=0x00007f3eb5b0f800 nid=0x8e651 waiting for monitor entry > [0x00007f3e84ef9000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) > - waiting to lock <0x00000006a1bc0630> (a java.lang.String) > at > org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) > at > org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) > at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) > at java.net.URL.<init>(java.base@11.0.15/URL.java:651) > at java.net.URL.<init>(java.base@11.0.15/URL.java:541) > at java.net.URL.<init>(java.base@11.0.15/URL.java:488) > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) > at > org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) > at > org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274) > {noformat} > Only a single mapper thread is active: > {noformat} > "Thread-23" #39 prio=5 os_prio=0 cpu=5830.17ms elapsed=647.09s > tid=0x00007f3eb5b42800 nid=0x8e661 in Object.wait() [0x00007f3e842ec000] > java.lang.Thread.State: RUNNABLE > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native > Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45) > at > java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170) > - locked <0x00000006a1bc0630> (a java.lang.String) > at > org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) > at > org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) > at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) > at java.net.URL.<init>(java.base@11.0.15/URL.java:651) > at java.net.URL.<init>(java.base@11.0.15/URL.java:541) > at java.net.URL.<init>(java.base@11.0.15/URL.java:488) > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) > at > org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) > at > org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274) > {noformat} > The stack of one thread shows a nested lock: > {noformat} > "Thread-26" #42 prio=5 os_prio=0 cpu=24.93ms elapsed=647.07s > tid=0x00007f3eb5b49800 nid=0x8e664 waiting for monitor entry > [0x00007f3e83fe7000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) > - waiting to lock <0x00000006a1bc0630> (a java.lang.String) > at > org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) > at > org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) > at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) > at java.net.URL.<init>(java.base@11.0.15/URL.java:651) > at java.net.URL.<init>(java.base@11.0.15/URL.java:541) > at java.net.URL.<init>(java.base@11.0.15/URL.java:488) > at > javax.crypto.JceSecurity.<clinit>(java.base@11.0.15/JceSecurity.java:239) > at javax.crypto.Cipher.getInstance(java.base@11.0.15/Cipher.java:540) > at > sun.security.ssl.JsseJce.getCipher(java.base@11.0.15/JsseJce.java:190) > at > sun.security.ssl.SSLCipher.isTransformationAvailable(java.base@11.0.15/SSLCipher.java:509) > at > sun.security.ssl.SSLCipher.<init>(java.base@11.0.15/SSLCipher.java:498) > at > sun.security.ssl.SSLCipher.<clinit>(java.base@11.0.15/SSLCipher.java:81) > at > sun.security.ssl.CipherSuite.<clinit>(java.base@11.0.15/CipherSuite.java:65) > at > sun.security.ssl.SSLContextImpl.getApplicableSupportedCipherSuites(java.base@11.0.15/SSLContextImpl.java:348) > at > sun.security.ssl.SSLContextImpl$AbstractTLSContext.<clinit>(java.base@11.0.15/SSLContextImpl.java:580) > at java.lang.Class.forName0(java.base@11.0.15/Native Method) > at java.lang.Class.forName(java.base@11.0.15/Class.java:315) > at > java.security.Provider$Service.getImplClass(java.base@11.0.15/Provider.java:1918) > at > java.security.Provider$Service.newInstance(java.base@11.0.15/Provider.java:1894) > at > sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:236) > at > sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:164) > at > javax.net.ssl.SSLContext.getInstance(java.base@11.0.15/SSLContext.java:168) > at org.apache.nutch.protocol.okhttp.OkHttp.<clinit>(OkHttp.java:97) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native > Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45) > at > java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170) > - locked <0x00000006a1bc0410> (a java.lang.String) > at > org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) > at > org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) > at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) > at java.net.URL.<init>(java.base@11.0.15/URL.java:651) > at java.net.URL.<init>(java.base@11.0.15/URL.java:541) > at java.net.URL.<init>(java.base@11.0.15/URL.java:488) > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) > at > org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) > at > org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) > at > org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274) > {noformat} > It isn't a dead-lock because the lock is on two different String objects in > different lines of code. However, a recursive barrier may make the > performance impact of locks much worse. The stack also shows that Nutch's > URLStreamHandlerFactory is also called from Java-internal methods. The > cryptographic classes in the stack could also give a hint what's going wrong > in NUTCH-2936. > I'm not yet clear what the solution could be: > - > [Extension.getExtensionInstance()|https://github.com/apache/nutch/blob/568993b908702b9631a705c8122ba99af1b72f04/src/java/org/apache/nutch/plugin/Extension.java#L154] > with nested synchronization is for sure not suitable to be called frequently > - maybe caching the URL stream handlers solves the problem > - but we should keep also NUTCH-2936 on the radar when looking for solutions -- This message was sent by Atlassian Jira (v8.20.10#820010)