https://bz.apache.org/bugzilla/show_bug.cgi?id=64108
Bug ID: 64108 Summary: unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C' Product: POI Version: 4.1.1-FINAL Hardware: All OS: All Status: NEW Severity: normal Priority: P2 Component: OPC Assignee: dev@poi.apache.org Reporter: r...@costine.org Target Milestone: --- Created attachment 36989 --> https://bz.apache.org/bugzilla/attachment.cgi?id=36989&action=edit This is a possible patch against REL_4_1_1 to encode unsafe pipe characters and a change to unit tests to show that it works. The pipe character ("|") in a Relationship target attribute is not being encoded into a "%7C" string. The pipe character is not one of the characters considered "valid" by this RFC: https://tools.ietf.org/html/rfc3986#section-2 In the 4.1.1 version, a comparison is made to see if "|" (0x7C) is whitespace, or is greater than or equal to 0x80. Since that evaluates "false" for a "|" character, it is deemed "safe" (in PackagingURIHelper) instead of "unsafe". Currently determining it as "safe" prevents it from being converted to "%7C", which would be the correct thing to do. The original way we found this was because we use the Tika AutoDetectParser to extract text from various document types (MS-Word docx, or xslx). Tika uses POI OPC to parse those documents in Open Office format, and although it logs the original POI exception, it doesn't fail right away. Instead, it internally replaces the original Relationship element target with the URL "http://invalid.uri" and fails with an exception later in the processing because that fake url is "absolute". As a result, there's no way to report to the user the actual Relationship in the document that contained the target with the "|". For what it's worth, this is the original Relationship element that caused parsing to fail: <Relationship Id="rId13" Target="#ctl||rId16||cmdAddAction||_x0000_i1029" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"/> Scanning the "Target" attribute failed at index 4 in that attribute's value, due to the "|" being in that place in the fragment after the "#". The only way I found the original exception from POI OPC code was to run our code with a debugger attached to it. Granted, the failure to include the root cause of the exception is an issue with Tika, and probably should be addressed by that group, but the POI OPC code should be considering the pipe character "unsafe" since it is not one of the characters that is specified as valid in RPC3986. Running the unit test changes in the attached patch (it is against the REL_4_1_1 tag), and removing the change to the code, I got this exception: java.net.URISyntaxException: Illegal character in fragment at index 4: #ctl||rId16||cmdAddAction||_x0000_i1029 at java.base/java.net.URI$Parser.fail(URI.java:2915) at java.base/java.net.URI$Parser.checkChars(URI.java:3086) at java.base/java.net.URI$Parser.parse(URI.java:3130) at java.base/java.net.URI.<init>(URI.java:600) at org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:724) at org.apache.poi.openxml4j.opc.TestPackagingURIHelper.testCreateURIFromString(TestPackagingURIHelper.java:128) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:252) at junit.framework.TestSuite.run(TestSuite.java:247) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) at java.base/java.lang.Thread.run(Thread.java:834) With the code changes to the class, the altered test passes. The attached patch changes the method in PackagingURIHelper from this: private static boolean isUnsafe(int ch) { return ch >= 0x80 || Character.isWhitespace(ch); } To this: private static boolean isUnsafe(int ch) { return ch >= 0x80 || ch == 0x7C || Character.isWhitespace(ch); } Then, the pipe characters in the URI will be deemed "unsafe" and subsequently converted properly to their hexadecimal '%7C' encodings. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org