Hi, Christian,
Thanks very much for looking into this. If I use the OxGarage TEI web
service through the front-end client to upload a file (
http://www.tei-c.org/oxgarage/), here is how it sends the request payload
on the back end. Non-ASCII characters are replaced with octal escape
sequences.
Encapsulated multipart part: (text/xml)
Content-Disposition: form-data; name="fileToConvert";
filename="tei.xml"\r\n
Content-Type: text/xml\r\n\r\n
eXtensible Markup Language
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Multipart test</title>
<author/>
</titleStmt>
<publicationStmt>
<p>unknown</p>
</publicationStmt>
<sourceDesc>
<p>unknown</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div type="level1">
<div type="level2">
<p n="4">
<hi rendition="simple:bold"/>
</p>
<p n="5" rend="Normal">
<hi rend="bold underline"> Regression
Equation </hi>
</p>
<p n="6" rend="Normal">
<math xmlns="
http://www.w3.org/1998/Math/MathML">
<mover accent="true">
<mrow>
<mi> Y </mi>
</mrow>
<mo> ^ </mo>
</mover>
<mo> = </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mn> 1 </mn>
</mrow>
</msub>
<mo> + </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mn> 2 </mn>
</mrow>
</msub>
<msub>
<mrow>
<mi> X </mi>
</mrow>
<mrow>
<mn> 2 </mn>
</mrow>
</msub>
<mo> + </mo>
<mo> \342\200\246 </mo>
<mo> + </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mi> i </mi>
</mrow>
</msub>
<msub>
<mrow>
<mi> X </mi>
</mrow>
<mrow>
<mi> i </mi>
</mrow>
</msub>
</math>
</p>
</div>
</div>
</body>
</text>
</TEI>
Boundary:
\r\n-----------------------------10775069631632435281298450283\r\n
--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library
www.linkedin.com/in/timathompson
[email protected]
On Sat, Mar 11, 2017 at 10:30 AM, Christian Grün <[email protected]>
wrote:
> Hi Tim,
>
> Finally some feedback on this issue.
>
> It turned out that I cannot provide an easy fix for the problem you
> encountered. Your observations have already summarized the problem,
> and you have also found out what is happening internally: Whenever a
> multi-part body contains non-ASCII data, the
> "Content-Transfer-Encoding:base64" header is added [1].
>
> I am now mostly wondering how non-ASCII characters should be
> transferred, if not encoded as base64. Do you have some idea how the
> request would need to look like for TEI-C to be parseable?
>
> Cheers,
> Christian
>
> [1] https://github.com/BaseXdb/basex/blob/master/basex-core/
> src/main/java/org/basex/util/http/HttpClient.java#L271
>
>
>
> > Content-Type: text/xml\r\n
> > Content-Transfer-Encoding: base64\r\n\r\n
> > eXtensible Markup Language
> > [truncated]
> > PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo
> ZWFkPjxtZXRhLz48\r\ndGl0bGU+VGVzdDwvdGl0bGU+PC9oZWFkPjxib2R5PjxtYXRoIHhtbG
> 5zPSJodHRwOi8vd3d3Lncz\r\nLm9yZy8xOTk4L01hdGgvTWF0aE1MIj48bXN1Yj48bWk+
> zrI8L21pPjxtbj5Ud288L21
> >
> > Attached here is a basic test case to replicate the problem: an HTML page
> > with a form and the RESTXQ function that it calls.
> >
> > I've tried setting a new header to specify Content-Transfer-Encoding as
> > "binary" instead of "base64," but it doesn't replace the default header.
> Is
> > there any way that the encoding could be controlled from RESTXQ?
> >
> > Thanks in advance!
> >
> > Tim
> >
> > --
> > Tim A. Thompson
> > Metadata Librarian (Spanish/Portuguese Specialty)
> > Princeton University Library
> >
> > www.linkedin.com/in/timathompson
> > [email protected]
>