Hi, Christian,

Thanks very much for looking into this. If I use the OxGarage TEI web
service through the front-end client to upload a file (
http://www.tei-c.org/oxgarage/), here is how it sends the request payload
on the back end. Non-ASCII characters are replaced with octal escape
sequences.

 Encapsulated multipart part:  (text/xml)
        Content-Disposition: form-data; name="fileToConvert";
filename="tei.xml"\r\n
        Content-Type: text/xml\r\n\r\n
        eXtensible Markup Language
            <TEI xmlns="http://www.tei-c.org/ns/1.0"; xml:lang="en">
                <teiHeader>
                    <fileDesc>
                        <titleStmt>
                            <title>Multipart test</title>
                            <author/>
                        </titleStmt>
                        <publicationStmt>
                            <p>unknown</p>
                        </publicationStmt>
                        <sourceDesc>
                            <p>unknown</p>
                        </sourceDesc>
                    </fileDesc>
                </teiHeader>
                <text>
                    <body>
                        <div type="level1">
                            <div type="level2">
                                <p n="4">
                                    <hi rendition="simple:bold"/>
                                </p>
                                <p n="5" rend="Normal">
                                    <hi rend="bold underline"> Regression
Equation </hi>
                                </p>
                                <p n="6" rend="Normal">
                                    <math xmlns="
http://www.w3.org/1998/Math/MathML";>
                                        <mover accent="true">
                                            <mrow>
                                                <mi> Y </mi>
                                            </mrow>
                                            <mo> ^ </mo>
                                        </mover>
                                        <mo> = </mo>
                                        <msub>
                                            <mrow>
                                                <mi> \316\262 </mi>
                                            </mrow>
                                            <mrow>
                                                <mn> 1 </mn>
                                            </mrow>
                                        </msub>
                                        <mo> + </mo>
                                        <msub>
                                            <mrow>
                                                <mi> \316\262 </mi>
                                            </mrow>
                                            <mrow>
                                                <mn> 2 </mn>
                                            </mrow>
                                        </msub>
                                        <msub>
                                            <mrow>
                                                <mi> X </mi>
                                            </mrow>
                                            <mrow>
                                                <mn> 2 </mn>
                                            </mrow>
                                        </msub>
                                        <mo> + </mo>
                                        <mo> \342\200\246 </mo>
                                        <mo> + </mo>
                                        <msub>
                                            <mrow>
                                                <mi> \316\262 </mi>
                                            </mrow>
                                            <mrow>
                                                <mi> i </mi>
                                            </mrow>
                                        </msub>
                                        <msub>
                                            <mrow>
                                                <mi> X </mi>
                                            </mrow>
                                            <mrow>
                                                <mi> i </mi>
                                            </mrow>
                                        </msub>
                                    </math>
                                </p>
                            </div>
                        </div>
                    </body>
                </text>
            </TEI>
    Boundary:
\r\n-----------------------------10775069631632435281298450283\r\n



--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library

www.linkedin.com/in/timathompson
[email protected]

On Sat, Mar 11, 2017 at 10:30 AM, Christian Grün <[email protected]>
wrote:

> Hi Tim,
>
> Finally some feedback on this issue.
>
> It turned out that I cannot provide an easy fix for the problem you
> encountered. Your observations have already summarized the problem,
> and you have also found out what is happening internally: Whenever a
> multi-part body contains non-ASCII data, the
> "Content-Transfer-Encoding:base64" header is added [1].
>
> I am now mostly wondering how non-ASCII characters should be
> transferred, if not encoded as base64. Do you have some idea how the
> request would need to look like for TEI-C to be parseable?
>
> Cheers,
> Christian
>
> [1] https://github.com/BaseXdb/basex/blob/master/basex-core/
> src/main/java/org/basex/util/http/HttpClient.java#L271
>
>
>
> > Content-Type: text/xml\r\n
> >         Content-Transfer-Encoding: base64\r\n\r\n
> >         eXtensible Markup Language
> >             [truncated]
> > PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo
> ZWFkPjxtZXRhLz48\r\ndGl0bGU+VGVzdDwvdGl0bGU+PC9oZWFkPjxib2R5PjxtYXRoIHhtbG
> 5zPSJodHRwOi8vd3d3Lncz\r\nLm9yZy8xOTk4L01hdGgvTWF0aE1MIj48bXN1Yj48bWk+
> zrI8L21pPjxtbj5Ud288L21
> >
> > Attached here is a basic test case to replicate the problem: an HTML page
> > with a form and the RESTXQ function that it calls.
> >
> > I've tried setting a new header to specify Content-Transfer-Encoding as
> > "binary" instead of "base64," but it doesn't replace the default header.
> Is
> > there any way that the encoding could be controlled from RESTXQ?
> >
> > Thanks in advance!
> >
> > Tim
> >
> > --
> > Tim A. Thompson
> > Metadata Librarian (Spanish/Portuguese Specialty)
> > Princeton University Library
> >
> > www.linkedin.com/in/timathompson
> > [email protected]
>

Reply via email to