I've created JIRA for this: https://issues.apache.org/jira/browse/UIMA-5791
Not yet sure how to fix this. Will take a look next week. If I understand
the requirements right, the default encoding should be UTF-8 when
deserializing service metadata..
There should also be a way to override the default. Seems like we need a
new cmdline arg (or property) for the client to override default encoding.
Jerry

On Thu, Jun 7, 2018 at 9:31 AM Marshall Schor <[email protected]> wrote:

> Recently, we debugged an issue where a user had a UIMA-AS client running
> on
> Windows, connecting to a UIMA-AS service running on Linux in the cloud.
>
> The linux box was set up with LANG etc set to UTF-8.  Windows did not have
> any
> special configuration.
>
> After a successful service deployment on Linux, the Windows client sent a
> get
> meta, which received a "message string" from the transport, and tried to
> parse
> it with the xml parser, but that returned an error
>
> org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202)
>
> Eventually the user worked around this launching the Windows client Java
> with
> the extra parameter
>
>   -D"file.encoding-UTF-8"
>
> which made this problem go away (but may introduce other issues).
>
> Should UIMA-AS communication protocols specify UTF-8 explicitly, instead
> of
> defaulting to "platform defaults" which seem to cause issues if the
> defaults
> aren't compatible?
>
> -Marshall
>

Reply via email to