https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

            Bug ID: 64108
           Summary: unsafe pipe character ("|") in Relationship target
                    attribute is not being encoded into a '%7C'
           Product: POI
           Version: 4.1.1-FINAL
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: OPC
          Assignee: dev@poi.apache.org
          Reporter: r...@costine.org
  Target Milestone: ---

Created attachment 36989
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36989&action=edit
This is a possible patch against REL_4_1_1 to encode unsafe pipe characters and
a change to unit tests to show that it works.

The pipe character ("|") in a Relationship target attribute is not being
encoded into a "%7C" string. The pipe character is not one of the characters
considered "valid" by this RFC: 

https://tools.ietf.org/html/rfc3986#section-2

In the 4.1.1 version, a comparison is made to see if "|" (0x7C) is whitespace,
or is greater than or equal to 0x80. Since that evaluates "false" for a "|"
character, it is deemed "safe" (in PackagingURIHelper) instead of "unsafe".
Currently determining it as "safe" prevents it from being converted to "%7C",
which would be the correct thing to do. 

The original way we found this was because we use the Tika AutoDetectParser to
extract text from various document types (MS-Word docx, or xslx). Tika uses POI
OPC to parse those documents in Open Office format, and although it logs the
original POI exception, it doesn't fail right away. Instead, it internally
replaces the original Relationship element target with the URL
"http://invalid.uri"; and fails with an exception later in the processing
because that fake url is "absolute". As a result, there's no way to report to
the user the actual Relationship in the document that contained the target with
the "|". 

For what it's worth, this is the original Relationship element that caused
parsing to fail:

        <Relationship Id="rId13"
Target="#ctl||rId16||cmdAddAction||_x0000_i1029"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"/>

Scanning the "Target" attribute failed at index 4 in that attribute's value,
due to the "|" being in that place in the fragment after the "#".

The only way I found the original exception from POI OPC code was to run our
code with a debugger attached to it.

Granted, the failure to include the root cause of the exception is an issue
with Tika, and probably should be addressed by that group, but the POI OPC code
should be considering the pipe character "unsafe" since it is not one of the
characters that is specified as valid in RPC3986.

Running the unit test changes in the attached patch (it is against the
REL_4_1_1 tag), and removing the change to the code, I got this exception: 

java.net.URISyntaxException: Illegal character in fragment at index 4:
#ctl||rId16||cmdAddAction||_x0000_i1029
        at java.base/java.net.URI$Parser.fail(URI.java:2915)
        at java.base/java.net.URI$Parser.checkChars(URI.java:3086)
        at java.base/java.net.URI$Parser.parse(URI.java:3130)
        at java.base/java.net.URI.<init>(URI.java:600)
        at
org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:724)
        at
org.apache.poi.openxml4j.opc.TestPackagingURIHelper.testCreateURIFromString(TestPackagingURIHelper.java:128)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:252)
        at junit.framework.TestSuite.run(TestSuite.java:247)
        at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
        at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
        at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
        at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
        at com.sun.proxy.$Proxy5.processTestClass(Unknown Source)
        at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
        at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
        at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
        at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
        at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
        at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
        at java.base/java.lang.Thread.run(Thread.java:834)

With the code changes to the class, the altered test passes.

The attached patch changes the method in PackagingURIHelper from this:

    private static boolean isUnsafe(int ch) {
        return ch >= 0x80 || Character.isWhitespace(ch);
    }

To this:

    private static boolean isUnsafe(int ch) {
        return ch >= 0x80 || ch == 0x7C || Character.isWhitespace(ch);
    }

Then, the pipe characters in the URI will be deemed "unsafe" and subsequently
converted properly to their hexadecimal '%7C' encodings.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to