Dear Tim, No progress so far, I’m sorry, but it was interesting to see the differences in the requests of BaseX and eXist. I was mostly wondering why there are so many parts in the BaseX multipart message, which seem to be completely missing in the eXist request. Are both outputs based on the same XQuery expression? Did you manage to run the same query with both implementations?
Thanks, Christian On Mon, Mar 13, 2017 at 8:27 PM, Tim Thompson <[email protected]> wrote: > Christian, > > For the sake of comparison, I'm attaching two text files with the HTTP > request/response output from my sample query on both the BaseX and eXist > RESTXQ implementations. Non-ASCII characters do not seem to be a problem in > the eXist implementation. You can see that eXist uses "Transer-Encoding: > chunked" whereas BaseX uses "Content-Transfer-Encoding: base64." But I'm > afraid I'm getting out of my depth here! > > Thanks again, > > Tim > > > -- > Tim A. Thompson > Metadata Librarian (Spanish/Portuguese Specialty) > Princeton University Library > > www.linkedin.com/in/timathompson > [email protected] > > On Mon, Mar 13, 2017 at 12:00 PM, Maximilian Gärber <[email protected]> > wrote: > >> Hi, >> >> this might differ from sending xml files, but if you sent any other file >> (image, word document) there is usually no conversion at all - just sending >> plain bytes (the headers do not even mention any encoding). >> >> From my understanding, it would be the users responsibilty to decide over >> the transfer encoding (if you do not specify it, then there might be some >> fallback, but currently you are forced to base64 - no matter what the >> headers already are). >> >> >> >> Br, >> Max >> >> >> >> >> >> >> 2017-03-11 18:17 GMT+01:00 Tim Thompson <[email protected]>: >> >>> Hi, Christian, >>> >>> Thanks very much for looking into this. If I use the OxGarage TEI web >>> service through the front-end client to upload a file ( >>> http://www.tei-c.org/oxgarage/), here is how it sends the request >>> payload on the back end. Non-ASCII characters are replaced with octal >>> escape sequences. >>> >>> Encapsulated multipart part: (text/xml) >>> Content-Disposition: form-data; name="fileToConvert"; >>> filename="tei.xml"\r\n >>> Content-Type: text/xml\r\n\r\n >>> eXtensible Markup Language >>> <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en"> >>> <teiHeader> >>> <fileDesc> >>> <titleStmt> >>> <title>Multipart test</title> >>> <author/> >>> </titleStmt> >>> <publicationStmt> >>> <p>unknown</p> >>> </publicationStmt> >>> <sourceDesc> >>> <p>unknown</p> >>> </sourceDesc> >>> </fileDesc> >>> </teiHeader> >>> <text> >>> <body> >>> <div type="level1"> >>> <div type="level2"> >>> <p n="4"> >>> <hi rendition="simple:bold"/> >>> </p> >>> <p n="5" rend="Normal"> >>> <hi rend="bold underline"> >>> Regression Equation </hi> >>> </p> >>> <p n="6" rend="Normal"> >>> <math xmlns="http://www.w3.org/1998/ >>> Math/MathML"> >>> <mover accent="true"> >>> <mrow> >>> <mi> Y </mi> >>> </mrow> >>> <mo> ^ </mo> >>> </mover> >>> <mo> = </mo> >>> <msub> >>> <mrow> >>> <mi> \316\262 </mi> >>> </mrow> >>> <mrow> >>> <mn> 1 </mn> >>> </mrow> >>> </msub> >>> <mo> + </mo> >>> <msub> >>> <mrow> >>> <mi> \316\262 </mi> >>> </mrow> >>> <mrow> >>> <mn> 2 </mn> >>> </mrow> >>> </msub> >>> <msub> >>> <mrow> >>> <mi> X </mi> >>> </mrow> >>> <mrow> >>> <mn> 2 </mn> >>> </mrow> >>> </msub> >>> <mo> + </mo> >>> <mo> \342\200\246 </mo> >>> <mo> + </mo> >>> <msub> >>> <mrow> >>> <mi> \316\262 </mi> >>> </mrow> >>> <mrow> >>> <mi> i </mi> >>> </mrow> >>> </msub> >>> <msub> >>> <mrow> >>> <mi> X </mi> >>> </mrow> >>> <mrow> >>> <mi> i </mi> >>> </mrow> >>> </msub> >>> </math> >>> </p> >>> </div> >>> </div> >>> </body> >>> </text> >>> </TEI> >>> Boundary: \r\n-------------------------- >>> ---10775069631632435281298450283\r\n >>> >>> >>> >>> -- >>> Tim A. Thompson >>> Metadata Librarian (Spanish/Portuguese Specialty) >>> Princeton University Library >>> >>> www.linkedin.com/in/timathompson >>> [email protected] >>> >>> On Sat, Mar 11, 2017 at 10:30 AM, Christian Grün < >>> [email protected]> wrote: >>> >>>> Hi Tim, >>>> >>>> Finally some feedback on this issue. >>>> >>>> It turned out that I cannot provide an easy fix for the problem you >>>> encountered. Your observations have already summarized the problem, >>>> and you have also found out what is happening internally: Whenever a >>>> multi-part body contains non-ASCII data, the >>>> "Content-Transfer-Encoding:base64" header is added [1]. >>>> >>>> I am now mostly wondering how non-ASCII characters should be >>>> transferred, if not encoded as base64. Do you have some idea how the >>>> request would need to look like for TEI-C to be parseable? >>>> >>>> Cheers, >>>> Christian >>>> >>>> [1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/ >>>> main/java/org/basex/util/http/HttpClient.java#L271 >>>> >>>> >>>> >>>> > Content-Type: text/xml\r\n >>>> > Content-Transfer-Encoding: base64\r\n\r\n >>>> > eXtensible Markup Language >>>> > [truncated] >>>> > PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo >>>> ZWFkPjxtZXRhLz48\r\ndGl0bGU+VGVzdDwvdGl0bGU+PC9oZWFkPjxib2R5 >>>> PjxtYXRoIHhtbG5zPSJodHRwOi8vd3d3Lncz\r\nLm9yZy8xOTk4L01hdGgv >>>> TWF0aE1MIj48bXN1Yj48bWk+zrI8L21pPjxtbj5Ud288L21 >>>> > >>>> > Attached here is a basic test case to replicate the problem: an HTML >>>> page >>>> > with a form and the RESTXQ function that it calls. >>>> > >>>> > I've tried setting a new header to specify Content-Transfer-Encoding >>>> as >>>> > "binary" instead of "base64," but it doesn't replace the default >>>> header. Is >>>> > there any way that the encoding could be controlled from RESTXQ? >>>> > >>>> > Thanks in advance! >>>> > >>>> > Tim >>>> > >>>> > -- >>>> > Tim A. Thompson >>>> > Metadata Librarian (Spanish/Portuguese Specialty) >>>> > Princeton University Library >>>> > >>>> > www.linkedin.com/in/timathompson >>>> > [email protected] >>>> >>> >>> >> >

