Pretty certainly a bug, and one that could explain the other issues. I
hope the fix is commit a757e02, just pushed to github. (I did just
upload a developer's version, but this commit did not make it into that.)
The problem turns out to be pretty simple. Marpa::R2 makes its own copy
of input string, and uses that. My plan for returning literal
substrings was to check if the input string was UTF8, and mark the
literal substring UTF8 if and only if that was the case. I forgot to
write that code, so everything was coming out Latin-1. If the string
actually contained UTF-8 characters > 127, these were being treated as
bytes and interpreted as Latin-1, which of course totally messed up
subsequent logic.
-- jeffrey
On 12/19/2013 09:56 PM, Durand Jean-Damien wrote:
The UTF8 flag is lost when using literal. Test case at
https://gist.github.com/jddurand/8050950.
Regards, Jean-Damien.
Le vendredi 20 décembre 2013 03:00:45 UTC+1, Jeffrey Kegler a écrit :
It should be in the same encoding as the original string. I'm
treating this as a possible bug, and investigating.
Could you create a test case? Just a simple case where the
something goes in as utf8, and comes out otherwise. Thanks! --
jeffrey
On 12/19/2013 03:49 PM, Durand Jean-Damien wrote:
Jeffrey,
In which encoding is the return of recce->literal ? I had to do
decode_utf8() to get back my my utf8 source -;
Thanks, JD.
--
You received this message because you are subscribed to the
Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected] <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.