[jira] [Commented] (HADOOP-12404) Disable caching for JarURLConnection to avoid sharing JarFile with other users when loading resource from URL in Configuration class.

zhihai xu (JIRA) Mon, 13 Feb 2017 04:21:25 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863164#comment-15863164
 ]


zhihai xu commented on HADOOP-12404:
------------------------------------

[~anishek] We see this issue in hivesever2 logs, all the queries share a JVM in 
different threads for hiveserver2. The finalize statement will also close the 
InputStream for ZipFile class so it will also depend on when the garbage 
collection happens. Normally we see this issue happened several times per day 
in hiveserver2 which run thousand of queries per day. I thought loading jar 
files should be very quick that is why it happens rarely. After we disable 
caching, this issue didn't happen any more. Did you see this issue also? What 
is your environment?

> Disable caching for JarURLConnection to avoid sharing JarFile with other 
> users when loading resource from URL in Configuration class.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12404
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12404
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Minor
>             Fix For: 2.8.0, 3.0.0-alpha1
>
>         Attachments: HADOOP-12404.000.patch
>
>
> Disable caching for JarURLConnection to avoid sharing JarFile with other 
> users when loading resource from URL in Configuration class.
> Currently {{Configuration#parse}} will call {{url.openStream}} to get the 
> InputStream for {{DocumentBuilder}} to parse.
> Based on the JDK source code, the calling sequence is 
> url.openStream => 
> [handler.openConnection.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/Handler.java]
>  => [new 
> JarURLConnection|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarURLConnection.java#JarURLConnection]
>  => JarURLConnection.connect => [factory.get(getJarFileURL(), 
> getUseCaches())|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarFileFactory.java]
>  =>  
> [URLJarFile.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/URLJarFile.java#URLJarFile.getJarFile%28java.net.URL%2Csun.net.www.protocol.jar.URLJarFile.URLJarFileCloseController%29]=>[JarFile.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/jar/JarFile.java#JarFile.getInputStream%28java.util.zip.ZipEntry%29]=>ZipFile.getInputStream
> If {{URLConnection#getUseCaches}} is true (by default), URLJarFile will be 
> shared for the same URL. If the shared URLJarFile is closed by other users, 
> all the InputStream returned by URLJarFile#getInputStream will be closed 
> based on the 
> [document|http://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipFile.html#getInputStream(java.util.zip.ZipEntry)]
> So we saw the following exception in a heavy-load system at rare situation 
> which cause a hive job failed 
> {code}
> 2014-10-21 23:44:41,856 ERROR org.apache.hadoop.hive.ql.exec.Task: Ended 
> Job = job_1413909398487_3696 with exception 
> 'java.lang.RuntimeException(java.io.IOException: Stream closed)' 
> java.lang.RuntimeException: java.io.IOException: Stream closed 
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2484) 
> at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2337) 
> at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2254) 
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:861) 
> at 
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2030) 
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479) 
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469) 
> at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:187) 
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:582) 
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j 
> ava:1614) 
> at 
> org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:580) 
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:598) 
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExe 
> cHelper.java:288) 
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExe 
> cHelper.java:547) 
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) 
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) 
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) 
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) 
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1516) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1283) 
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1101) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:919) 
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation 
> .java:145) 
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation. 
> java:69) 
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.jav 
> a:200) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j 
> ava:1614) 
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java: 
> 502) 
> at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java: 
> 213) 
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1 
> 145) 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 
> 615) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: Stream closed 
> at 
> java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:67) 
> at 
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:142) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStr 
> eam.read(XMLEntityManager.java:2902) 
> at 
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java: 
> 302) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScan 
> ner.java:1753) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntity 
> Scanner.java:1426) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$Frag 
> mentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2807) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocu 
> mentScannerImpl.java:606) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNS 
> DocumentScannerImpl.java:117) 
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan 
> Document(XMLDocumentFragmentScannerImpl.java:510) 
> at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Co 
> nfiguration.java:848) 
> at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Co 
> nfiguration.java:777) 
> at 
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:1 
> 41) 
> at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:2 
> 43) 
> at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentB 
> uilderImpl.java:347) 
> at 
> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) 
> at 
> org.apache.hadoop.conf.Configuration.parse(Configuration.java:2325) 
> at 
> org.apache.hadoop.conf.Configuration.parse(Configuration.java:2313) 
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2384)
> {code}
> Also we can save a little bit memory, with [JarURLConnection's 
> caches|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarFileFactory.java#JarFileFactory.getCachedJarFile%28java.net.URL%29]
>  disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12404) Disable caching for JarURLConnection to avoid sharing JarFile with other users when loading resource from URL in Configuration class.

Reply via email to