I've created JIRA for this: https://issues.apache.org/jira/browse/UIMA-5791 Not yet sure how to fix this. Will take a look next week. If I understand the requirements right, the default encoding should be UTF-8 when deserializing service metadata.. There should also be a way to override the default. Seems like we need a new cmdline arg (or property) for the client to override default encoding. Jerry
On Thu, Jun 7, 2018 at 9:31 AM Marshall Schor <[email protected]> wrote: > Recently, we debugged an issue where a user had a UIMA-AS client running > on > Windows, connecting to a UIMA-AS service running on Linux in the cloud. > > The linux box was set up with LANG etc set to UTF-8. Windows did not have > any > special configuration. > > After a successful service deployment on Linux, the Windows client sent a > get > meta, which received a "message string" from the transport, and tried to > parse > it with the xml parser, but that returned an error > > org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence. > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) > at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:202) > > Eventually the user worked around this launching the Windows client Java > with > the extra parameter > > -D"file.encoding-UTF-8" > > which made this problem go away (but may introduce other issues). > > Should UIMA-AS communication protocols specify UTF-8 explicitly, instead > of > defaulting to "platform defaults" which seem to cause issues if the > defaults > aren't compatible? > > -Marshall >
