My apologies for replying to my own thread ... but I've done some further
digging on this.  I think I've convinced myself that there's a bug in the
handling of this sort of character encoding in HTTP header values.

Excerpts RFC 2047, which governs this sort of encoding:

Instead, certain sequences of "ordinary" printable ASCII characters
   (known as "encoded-words") are reserved for use as encoded data.  The
   syntax of encoded-words is such that they are unlikely to
   "accidentally" appear as normal text in message headers.
   Furthermore, the characters used in encoded-words are restricted to
   those which do not have special meanings in the context in which the
   encoded-word appears.

   Generally, an "encoded-word" is a sequence of printable ASCII
   characters that begins with "=?", ends with "?=", and has two "?"s in
   between.  It specifies a character set and an encoding method, and
   also includes the original text encoded as graphic ASCII characters,
   according to the rules for that encoding method.


The RFC goes on to specify the ABNF grammar that defines an "encoded-word"
... and none of my test samples match the grammar.  As such, it seems to me
that ML should not be attempting to interpret these values as encoded
characters.

The full RFC can be found here:
http://tools.ietf.org/html/rfc2047

Further, if I use a properly encoded value, the entire encoded section of
the header value is simply omitted when I read the value using
xdmp:get-request-header().

I plan to enter a support ticket for this, unless it's already a known bug?

-Bob


On Tue, Oct 15, 2013 at 2:32 PM, Bob H. <[email protected]> wrote:

> I'm seeing what I consider to be odd behavior when getting HTTP request
> header values via xdmp:get-request-header().  If the header value contains
> the string "=?", those characters are missing when I get the header value.
> If the header value contains an additional "?" character after the initial
> "=?", then an XDMP-ENCODING error is thrown.  I think I understand why this
> is happening, but not how to avoid it.
>
> I wrote a module that simply returns the HTTP header values that were
> provided on the request, and called the module via an HTTP app server.
>
> Here's the code:
> for $name in xdmp:get-request-header-names()
> return
>     fn:concat($name, ': ', xdmp:get-request-header($name) )
>
> When I call the module with a header value containing "=?", it echos the
> value back with the "=?" stripped out.  For example:
> TestHeader1: Test=?Value
> => TestHeader1: TestValue
>
> When I call the module with a header value containing "=?" and another "?"
> anywhere in the remainder of the header value, I get nothing back and see
> an XDMP-ENCODING error in the ErrorLog:
> TestHeader1: Test=?Value?
> => (no response)
>
> Error: AppRequestTask::run: XDMP-ENCODING: (err:XQST0087) Unsupported
> character encoding: Value
>
> I'm guessing this is because technically one is allowed to specify the
> character encoding of an HTTP header value via MIME encoding, as described
> here:
>
> http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding
>
> All of that brings me to my question ... in my case, the header value is
> not encoded in any special way, but I do want to be able to use the
> characters "=" and "?" in header values.  Is there a way to disable this
> character encoding processing (as I have no need to use alternate encodings
> in my header values at this time) ... or some other way to work around this
> behavior?
>
> Thanks!
>
> -Bob
>
> --
> Bob Hodgeman
> Consulting Engineer
> LexisNexis
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to