[
https://issues.apache.org/jira/browse/HADOOP-12404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804837#comment-14804837
]
zhihai xu commented on HADOOP-12404:
------------------------------------
thanks [~asuresh] for the review! Yes, No easy way for the test case because
all the functionalities to verify are in JDK. which is not accessible from
hadoop. Will commit it tomorrow if no one objects.
> Disable caching for JarURLConnection to avoid sharing JarFile with other
> users when loading resource from URL in Configuration class.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-12404
> URL: https://issues.apache.org/jira/browse/HADOOP-12404
> Project: Hadoop Common
> Issue Type: Improvement
> Components: conf
> Reporter: zhihai xu
> Assignee: zhihai xu
> Priority: Minor
> Attachments: HADOOP-12404.000.patch
>
>
> Disable caching for JarURLConnection to avoid sharing JarFile with other
> users when loading resource from URL in Configuration class.
> Currently {{Configuration#parse}} will call {{url.openStream}} to get the
> InputStream for {{DocumentBuilder}} to parse.
> Based on the JDK source code, the calling sequence is
> url.openStream =>
> [handler.openConnection.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/Handler.java]
> => [new
> JarURLConnection|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarURLConnection.java#JarURLConnection]
> => JarURLConnection.connect => [factory.get(getJarFileURL(),
> getUseCaches())|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarFileFactory.java]
> =>
> [URLJarFile.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/URLJarFile.java#URLJarFile.getJarFile%28java.net.URL%2Csun.net.www.protocol.jar.URLJarFile.URLJarFileCloseController%29]=>[JarFile.getInputStream|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/jar/JarFile.java#JarFile.getInputStream%28java.util.zip.ZipEntry%29]=>ZipFile.getInputStream
> If {{URLConnection#getUseCaches}} is true (by default), URLJarFile will be
> shared for the same URL. If the shared URLJarFile is closed by other users,
> all the InputStream returned by URLJarFile#getInputStream will be closed
> based on the
> [document|http://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipFile.html#getInputStream(java.util.zip.ZipEntry)]
> So we saw the following exception in a heavy-load system at rare situation
> which cause a hive job failed
> {code}
> 2014-10-21 23:44:41,856 ERROR org.apache.hadoop.hive.ql.exec.Task: Ended
> Job = job_1413909398487_3696 with exception
> 'java.lang.RuntimeException(java.io.IOException: Stream closed)'
> java.lang.RuntimeException: java.io.IOException: Stream closed
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2484)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2337)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2254)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:861)
> at
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2030)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
> at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:187)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:582)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
> ava:1614)
> at
> org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:580)
> at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:598)
> at
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExe
> cHelper.java:288)
> at
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExe
> cHelper.java:547)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
> at
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1516)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1283)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1101)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:919)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation
> .java:145)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.
> java:69)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.jav
> a:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
> ava:1614)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:
> 502)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:
> 213)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
> 145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
> 615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Stream closed
> at
> java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:67)
> at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:142)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStr
> eam.read(XMLEntityManager.java:2902)
> at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:
> 302)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScan
> ner.java:1753)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntity
> Scanner.java:1426)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$Frag
> mentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2807)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocu
> mentScannerImpl.java:606)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNS
> DocumentScannerImpl.java:117)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan
> Document(XMLDocumentFragmentScannerImpl.java:510)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Co
> nfiguration.java:848)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Co
> nfiguration.java:777)
> at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:1
> 41)
> at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:2
> 43)
> at
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentB
> uilderImpl.java:347)
> at
> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
> at
> org.apache.hadoop.conf.Configuration.parse(Configuration.java:2325)
> at
> org.apache.hadoop.conf.Configuration.parse(Configuration.java:2313)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2384)
> {code}
> Also we can save a little bit memory, with [JarURLConnection's
> caches|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/net/www/protocol/jar/JarFileFactory.java#JarFileFactory.getCachedJarFile%28java.net.URL%29]
> disabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)