Pretty certainly a bug, and one that could explain the other issues. I hope the fix is commit a757e02, just pushed to github. (I did just upload a developer's version, but this commit did not make it into that.)

The problem turns out to be pretty simple. Marpa::R2 makes its own copy of input string, and uses that. My plan for returning literal substrings was to check if the input string was UTF8, and mark the literal substring UTF8 if and only if that was the case. I forgot to write that code, so everything was coming out Latin-1. If the string actually contained UTF-8 characters > 127, these were being treated as bytes and interpreted as Latin-1, which of course totally messed up subsequent logic.

-- jeffrey

On 12/19/2013 09:56 PM, Durand Jean-Damien wrote:
The UTF8 flag is lost when using literal. Test case at https://gist.github.com/jddurand/8050950.

Regards, Jean-Damien.

Le vendredi 20 décembre 2013 03:00:45 UTC+1, Jeffrey Kegler a écrit :

    It should be in the same encoding as the original string.  I'm
    treating this as a possible bug, and investigating.

    Could you create a test case?  Just a simple case where the
    something goes in as utf8, and comes out otherwise.  Thanks!  --
    jeffrey

    On 12/19/2013 03:49 PM, Durand Jean-Damien wrote:
    Jeffrey,

    In which encoding is the return of recce->literal ? I had to do
    decode_utf8() to get back my my utf8 source -;

    Thanks, JD.

-- You received this message because you are subscribed to the
    Google Groups "marpa parser" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out
    <https://groups.google.com/groups/opt_out>.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to