Please subscribe to the Moses mailing list before posting to it. You can subscribe here: http://mailman.mit.edu/mailman/listinfo/moses-support
To answer your question - the source code line that it blows up on seems to be just for debugging that you delete/comment out. Trying deleting line moses/server/TranslationRequest.cpp:473 and let me know if it works Hieu Hoang http://moses-smt.org/ ---------- Forwarded message ---------- From: <[email protected]> Date: 12 October 2017 at 12:18 Subject: Moses-support post from [email protected] requires approval To: [email protected] As list administrator, your authorization is requested for the following mailing list posting: List: [email protected] From: [email protected] Subject: Moses-Server girerr::error on characters outside the BMP Reason: Post by non-member to a members-only list At your convenience, visit: http://mailman.mit.edu/mailman/admindb/moses-support to approve or deny the request. ---------- Forwarded message ---------- From: Trung Nguyen <[email protected]> To: [email protected] Cc: Bcc: Date: Thu, 12 Oct 2017 13:18:36 +0200 Subject: Moses-Server girerr::error on characters outside the BMP I am running moses in server mode to translate from modern Vietnamese to old Vietnamese characters. Many of these old characters are not in the Basic Multilingual Plane of Unicode, for example the word "hai" corresponds to the character "𠄩", which has the code point U+20129. On the command line everything works fine. But in server mode characters outside the BMP, i.e. code points above 0xFFFF, cause the server to terminate. I am using a simple python 3 script to query the moses server: import xmlrpc.client client = xmlrpc.client.ServerProxy('http://localhost:8012/RPC2') result = client.translate({'text': 'hai'}) translation = result.get('text') The error message I get: Translating: hai Line 0: Collecting options took 0.000 seconds at moses/Manager.cpp Line 141 Line 0: Search took 0.000 seconds [moses/server/TranslationRequest.cpp:473] BEST TRANSLATION: 𠄩 [1] [total=-2.740] core=(0.000,-1.000,1.000,0.000,0.000,0.000,0.000,-0.016, 0.000,0.000,0.000,0.000,0.000,0.000,-7.869) terminate called after throwing an instance of 'girerr::error' what(): 10-byte supposed UTF-8 string is not valid UTF-8. UTF-8 string contains a character not in the Basic Multilingual Plane (first byte 0xfffffff0) Thank You Trung Nguyen ---------- Forwarded message ---------- From: [email protected] To: Cc: Bcc: Date: Subject: confirm 37c0533a64482c99f31c2923d651dd0851292435 If you reply to this message, keeping the Subject: header intact, Mailman will discard the held message. Do this if the message is spam. If you reply to this message and include an Approved: header with the list password in it, the message will be approved for posting to the list. The Approved: header can also appear in the first line of the body of the reply.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
