Hello everyone,
I have been trying to get the start and end positions of the last
matched rule for some time -
and I got into trouble when I tried an example that had unicode. Here is a
simplified version
below that shows that the position (the return value of the read() method)
is counted in a wrong
because of the unicode character; the code works fine if it is replaced
with a non-unicode char,
for example '='. (The start position and the length are, by the way, given
in token - but I solve this
by using my $last_expression = $recce->substring($start_rule,
$length_rule); and getting its length.)
Is here Marpa at fault for not counting unicode right - or did I just
use "encode", "decode" or something
else in wrong way?
#################################################################################################
use utf8;
use 5.014;
use strict;
use warnings;
use Data::Dumper;
use Marpa::R2;
use Encode;
my $dsl = encode("UTF-8",<<END_OF_DSL);
:start ::= Start
:default ::= action => do_print
Start ::= Rule1
Rule1 ::= '≠'
event 'Start' = completed Start
END_OF_DSL
#Initialize grammar#
my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
my $recce = Marpa::R2::Scanless::R->new(
{ grammar => $grammar, semantics_package => 'My_Actions' } );
my $input = encode("UTF-8",'≠');
my $pos = $recce->read( \$input);
my ($start_rule, $length_rule) = $recce->last_completed("Start");
print("$pos"); # $pos == 3 since the unicode symbol is counted as 3 symbols
(for usual symbols - $pos ==1 as expected)
###############################################################################################
Thank you in advance for help regarding this issue
Best regards,
Toloaca Ion
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.