[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-11-03 Thread Pablo Galindo Salgado
Just be aware that the C tokenizer interface is NOT a public interface and is there only so we can test the C tokenizer itself. This can and will break at any point without previous warning in any way.Pablo Galindo SalgadoOn 3 Nov 2022, at 18:10, David J W  wrote:Following up, Pablo spotted my problem with the mixup of NL & NEWLINE tokens.  I was using tokenize.py in cPython's stdlib with a simple python script to build ridiculously strict unit tests.  My solution to that problem was originally to figure out how to access cPython's internal c tokenizer but someone else did that in 3.11.   The parser is passing basic tests but I need to redo all of the tests for my tokenizer as they are flawed and also do some major housekeeping to clean up all the warnings and TODO's sprinkled throughout my code base.To hopefully avoid future problems, is Lib/symtable.py trustworthy as a way of building unit tests when I start implementing my own symbols graph/table?Thanks,    DavidOn Wed, Oct 26, 2022 at 11:57 PM Matthieu Dartiailh  wrote:If you look at pegen, that uses the stdlib tokenizer as input, you will see that the obejct us3d to implement memoization on top of a token stream simply swallow NL (https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49). This is safe since NL has no syntactic meaning only NEWLINE does.BestMatthieuOn Thu, Oct 27, 2022, 01:59 Matthias Görgens  wrote:Hi David,Could you share what you have so far, perhaps ok GitHub or so? That way it's easier to diagnose your problems. I'm reasonably familiar with Rust.Perhaps also add a minimal crashing example?Cheers,Matthias.On Thu, 27 Oct 2022, 04:52 David J W,  wrote:Pablo,    Nl and Newline are tokens but I am interested in NEWLINE's behavior in the Python grammar, note the casing.For example in simple_stmts @ https://github.com/python/cpython/blob/main/Grammar/python.gram#L107Is that NEWLINE some sort of built in rule to the grammar?   In my project I am running into problems where the parser crashes any time there is some double like NL & N or Newline & NL but I want to nail down NEWLINE's behavior in CPython's PEG grammar.On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado  wrote:Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, not a parser rule. What decides when NEWLINE is emitted is the lexer that has nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C that doesn’t share code with the exposed version. You will find that the tokenizer module in the standard library actually behaves differently regarding what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W  wrote:
> 
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of development.
> 
> I copied and modified a PEG grammar ruleset from another open source project and I've already noticed some problems (ex Newline vs NL) with how they transcribed things.
> 
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity check if that is right before I figure out how to hack in a NEWLINE rule and update my grammar ruleset.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
Code of Conduct: http://python.org/psf/codeofconduct/


___Python-Dev mailing list -- python-dev@python.orgTo unsubscribe send an email to 

[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-11-03 Thread David J W
Following up, Pablo spotted my problem with the mixup of NL & NEWLINE
tokens.  I was using tokenize.py in cPython's stdlib with a simple python
script to build ridiculously strict unit tests.

My solution to that problem was originally to figure out how to access
cPython's internal c tokenizer but someone else did that in 3.11.   The
parser is passing basic tests but I need to redo all of the tests for my
tokenizer as they are flawed and also do some major housekeeping to clean
up all the warnings and TODO's sprinkled throughout my code base.

To hopefully avoid future problems, is Lib/symtable.py trustworthy as a way
of building unit tests when I start implementing my own symbols graph/table?


Thanks,
David



On Wed, Oct 26, 2022 at 11:57 PM Matthieu Dartiailh 
wrote:

> If you look at pegen, that uses the stdlib tokenizer as input, you will
> see that the obejct us3d to implement memoization on top of a token stream
> simply swallow NL (
> https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49).
> This is safe since NL has no syntactic meaning only NEWLINE does.
>
> Best
>
> Matthieu
>
> On Thu, Oct 27, 2022, 01:59 Matthias Görgens 
> wrote:
>
>> Hi David,
>>
>> Could you share what you have so far, perhaps ok GitHub or so? That way
>> it's easier to diagnose your problems. I'm reasonably familiar with Rust.
>>
>> Perhaps also add a minimal crashing example?
>>
>> Cheers,
>> Matthias.
>>
>> On Thu, 27 Oct 2022, 04:52 David J W,  wrote:
>>
>>> Pablo,
>>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>>> in the Python grammar, note the casing.
>>>
>>> For example in simple_stmts @
>>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>>
>>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>>> project I am running into problems where the parser crashes any time there
>>> is some double like NL & N or Newline & NL but I want to nail down
>>> NEWLINE's behavior in CPython's PEG grammar.
>>>
>>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
 Hi,

 I am not sure I understand exactly what you are asking but NEWLINE is a
 token, not a parser rule. What decides when NEWLINE is emitted is the lexer
 that has nothing to do with PEG. Normally PEG parsers also acts as
 tokenizers but the one in cpython does not.

 Also notice that CPython’s parser uses a version of the tokeniser
 written in C that doesn’t share code with the exposed version. You will
 find that the tokenizer module in the standard library actually behaves
 differently regarding what tokens are emitted in new lines and 
 indentations.

 The only way to be sure is check the code unfortunately.

 Hope this helps.

 Regards from rainy London,
 Pablo Galindo Salgado

 > On 26 Oct 2022, at 19:12, David J W  wrote:
 >
 > 
 > I am writing a Rust version of Python for fun and I am at the parser
 stage of development.
 >
 > I copied and modified a PEG grammar ruleset from another open source
 project and I've already noticed some problems (ex Newline vs NL) with how
 they transcribed things.
 >
 > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
 the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
 sanity check if that is right before I figure out how to hack in a NEWLINE
 rule and update my grammar ruleset.
 > ___
 > Python-Dev mailing list -- python-dev@python.org
 > To unsubscribe send an email to python-dev-le...@python.org
 > https://mail.python.org/mailman3/lists/python-dev.python.org/
 > Message archived at
 https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
 > Code of Conduct: http://python.org/psf/codeofconduct/

>>> ___
>>> Python-Dev mailing list -- python-dev@python.org
>>> To unsubscribe send an email to python-dev-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org

[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-27 Thread Matthieu Dartiailh
If you look at pegen, that uses the stdlib tokenizer as input, you will see
that the obejct us3d to implement memoization on top of a token stream
simply swallow NL (
https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49).
This is safe since NL has no syntactic meaning only NEWLINE does.

Best

Matthieu

On Thu, Oct 27, 2022, 01:59 Matthias Görgens 
wrote:

> Hi David,
>
> Could you share what you have so far, perhaps ok GitHub or so? That way
> it's easier to diagnose your problems. I'm reasonably familiar with Rust.
>
> Perhaps also add a minimal crashing example?
>
> Cheers,
> Matthias.
>
> On Thu, 27 Oct 2022, 04:52 David J W,  wrote:
>
>> Pablo,
>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>> in the Python grammar, note the casing.
>>
>> For example in simple_stmts @
>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>
>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>> project I am running into problems where the parser crashes any time there
>> is some double like NL & N or Newline & NL but I want to nail down
>> NEWLINE's behavior in CPython's PEG grammar.
>>
>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand exactly what you are asking but NEWLINE is a
>>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>>> that has nothing to do with PEG. Normally PEG parsers also acts as
>>> tokenizers but the one in cpython does not.
>>>
>>> Also notice that CPython’s parser uses a version of the tokeniser
>>> written in C that doesn’t share code with the exposed version. You will
>>> find that the tokenizer module in the standard library actually behaves
>>> differently regarding what tokens are emitted in new lines and indentations.
>>>
>>> The only way to be sure is check the code unfortunately.
>>>
>>> Hope this helps.
>>>
>>> Regards from rainy London,
>>> Pablo Galindo Salgado
>>>
>>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>>> >
>>> > 
>>> > I am writing a Rust version of Python for fun and I am at the parser
>>> stage of development.
>>> >
>>> > I copied and modified a PEG grammar ruleset from another open source
>>> project and I've already noticed some problems (ex Newline vs NL) with how
>>> they transcribed things.
>>> >
>>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>>> sanity check if that is right before I figure out how to hack in a NEWLINE
>>> rule and update my grammar ruleset.
>>> > ___
>>> > Python-Dev mailing list -- python-dev@python.org
>>> > To unsubscribe send an email to python-dev-le...@python.org
>>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> > Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5SPCIOVE5TSZ2DRJT75NKEWQWAKQHKII/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Matthias Görgens
Hi David,

Could you share what you have so far, perhaps ok GitHub or so? That way
it's easier to diagnose your problems. I'm reasonably familiar with Rust.

Perhaps also add a minimal crashing example?

Cheers,
Matthias.

On Thu, 27 Oct 2022, 04:52 David J W,  wrote:

> Pablo,
> Nl and Newline are tokens but I am interested in NEWLINE's behavior in
> the Python grammar, note the casing.
>
> For example in simple_stmts @
> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>
> Is that NEWLINE some sort of built in rule to the grammar?   In my project
> I am running into problems where the parser crashes any time there is some
> double like NL & N or Newline & NL but I want to nail down NEWLINE's
> behavior in CPython's PEG grammar.
>
> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Hi,
>>
>> I am not sure I understand exactly what you are asking but NEWLINE is a
>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>> that has nothing to do with PEG. Normally PEG parsers also acts as
>> tokenizers but the one in cpython does not.
>>
>> Also notice that CPython’s parser uses a version of the tokeniser written
>> in C that doesn’t share code with the exposed version. You will find that
>> the tokenizer module in the standard library actually behaves differently
>> regarding what tokens are emitted in new lines and indentations.
>>
>> The only way to be sure is check the code unfortunately.
>>
>> Hope this helps.
>>
>> Regards from rainy London,
>> Pablo Galindo Salgado
>>
>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>> >
>> > 
>> > I am writing a Rust version of Python for fun and I am at the parser
>> stage of development.
>> >
>> > I copied and modified a PEG grammar ruleset from another open source
>> project and I've already noticed some problems (ex Newline vs NL) with how
>> they transcribed things.
>> >
>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>> sanity check if that is right before I figure out how to hack in a NEWLINE
>> rule and update my grammar ruleset.
>> > ___
>> > Python-Dev mailing list -- python-dev@python.org
>> > To unsubscribe send an email to python-dev-le...@python.org
>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>> > Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hummm… he is also mentioning NL and Newline tokens and if I recall correctly those are tokens that only appear in the Python tokenizer and are emitted differently from the C one (and therefore they are not used in the grammar).Pablo Galindo SalgadoOn 26 Oct 2022, at 21:57, Guido van Rossum  wrote:I wonder if David may be struggling with the rule that a newline is significant in the grammar unless it appears inside matching brackets/parentheses/braces? I think that's in the lexer. Similarly, multiple newlines are collapsed.On Wed, Oct 26, 2022 at 1:19 PM Pablo Galindo Salgado  wrote:Hi,As I mentioned, NEWLINE is a token. All uppercase words in the grammar are tokens and therefore are produced by the lexer, not the parser. Is not a built-in rule. In particular, that token is produced here:https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773On Wed, 26 Oct 2022 at 20:59, David J W  wrote:Pablo,    Nl and Newline are tokens but I am interested in NEWLINE's behavior in the Python grammar, note the casing.For example in simple_stmts @ https://github.com/python/cpython/blob/main/Grammar/python.gram#L107Is that NEWLINE some sort of built in rule to the grammar?   In my project I am running into problems where the parser crashes any time there is some double like NL & N or Newline & NL but I want to nail down NEWLINE's behavior in CPython's PEG grammar.On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado  wrote:Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, not a parser rule. What decides when NEWLINE is emitted is the lexer that has nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C that doesn’t share code with the exposed version. You will find that the tokenizer module in the standard library actually behaves differently regarding what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W  wrote:
> 
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of development.
> 
> I copied and modified a PEG grammar ruleset from another open source project and I've already noticed some problems (ex Newline vs NL) with how they transcribed things.
> 
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity check if that is right before I figure out how to hack in a NEWLINE rule and update my grammar ruleset.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)Pronouns: he/him (why is my pronoun here?)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KUXABSTZP33ZEXB74HS5262TGNFGBCP7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Guido van Rossum
I wonder if David may be struggling with the rule that a newline is
significant in the grammar unless it appears inside matching
brackets/parentheses/braces? I think that's in the lexer. Similarly,
multiple newlines are collapsed.

On Wed, Oct 26, 2022 at 1:19 PM Pablo Galindo Salgado 
wrote:

> Hi,
>
> As I mentioned, NEWLINE is a token. All uppercase words in the grammar are
> tokens and therefore are produced by the lexer, not the parser. Is not a
> built-in rule. In particular, that token is produced here:
>
>
> https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773
>
>
> On Wed, 26 Oct 2022 at 20:59, David J W  wrote:
>
>> Pablo,
>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>> in the Python grammar, note the casing.
>>
>> For example in simple_stmts @
>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>
>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>> project I am running into problems where the parser crashes any time there
>> is some double like NL & N or Newline & NL but I want to nail down
>> NEWLINE's behavior in CPython's PEG grammar.
>>
>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand exactly what you are asking but NEWLINE is a
>>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>>> that has nothing to do with PEG. Normally PEG parsers also acts as
>>> tokenizers but the one in cpython does not.
>>>
>>> Also notice that CPython’s parser uses a version of the tokeniser
>>> written in C that doesn’t share code with the exposed version. You will
>>> find that the tokenizer module in the standard library actually behaves
>>> differently regarding what tokens are emitted in new lines and indentations.
>>>
>>> The only way to be sure is check the code unfortunately.
>>>
>>> Hope this helps.
>>>
>>> Regards from rainy London,
>>> Pablo Galindo Salgado
>>>
>>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>>> >
>>> > 
>>> > I am writing a Rust version of Python for fun and I am at the parser
>>> stage of development.
>>> >
>>> > I copied and modified a PEG grammar ruleset from another open source
>>> project and I've already noticed some problems (ex Newline vs NL) with how
>>> they transcribed things.
>>> >
>>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>>> sanity check if that is right before I figure out how to hack in a NEWLINE
>>> rule and update my grammar ruleset.
>>> > ___
>>> > Python-Dev mailing list -- python-dev@python.org
>>> > To unsubscribe send an email to python-dev-le...@python.org
>>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> > Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MD2THJ5BIBDSOB7HVFDPBUNCW76H5N3S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread David J W
Pablo,
Nl and Newline are tokens but I am interested in NEWLINE's behavior in
the Python grammar, note the casing.

For example in simple_stmts @
https://github.com/python/cpython/blob/main/Grammar/python.gram#L107

Is that NEWLINE some sort of built in rule to the grammar?   In my project
I am running into problems where the parser crashes any time there is some
double like NL & N or Newline & NL but I want to nail down NEWLINE's
behavior in CPython's PEG grammar.

On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado 
wrote:

> Hi,
>
> I am not sure I understand exactly what you are asking but NEWLINE is a
> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
> that has nothing to do with PEG. Normally PEG parsers also acts as
> tokenizers but the one in cpython does not.
>
> Also notice that CPython’s parser uses a version of the tokeniser written
> in C that doesn’t share code with the exposed version. You will find that
> the tokenizer module in the standard library actually behaves differently
> regarding what tokens are emitted in new lines and indentations.
>
> The only way to be sure is check the code unfortunately.
>
> Hope this helps.
>
> Regards from rainy London,
> Pablo Galindo Salgado
>
> > On 26 Oct 2022, at 19:12, David J W  wrote:
> >
> > 
> > I am writing a Rust version of Python for fun and I am at the parser
> stage of development.
> >
> > I copied and modified a PEG grammar ruleset from another open source
> project and I've already noticed some problems (ex Newline vs NL) with how
> they transcribed things.
> >
> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for the
> parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
> sanity check if that is right before I figure out how to hack in a NEWLINE
> rule and update my grammar ruleset.
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hi,

As I mentioned, NEWLINE is a token. All uppercase words in the grammar are
tokens and therefore are produced by the lexer, not the parser. Is not a
built-in rule. In particular, that token is produced here:

https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773


On Wed, 26 Oct 2022 at 20:59, David J W  wrote:

> Pablo,
> Nl and Newline are tokens but I am interested in NEWLINE's behavior in
> the Python grammar, note the casing.
>
> For example in simple_stmts @
> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>
> Is that NEWLINE some sort of built in rule to the grammar?   In my project
> I am running into problems where the parser crashes any time there is some
> double like NL & N or Newline & NL but I want to nail down NEWLINE's
> behavior in CPython's PEG grammar.
>
> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> Hi,
>>
>> I am not sure I understand exactly what you are asking but NEWLINE is a
>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>> that has nothing to do with PEG. Normally PEG parsers also acts as
>> tokenizers but the one in cpython does not.
>>
>> Also notice that CPython’s parser uses a version of the tokeniser written
>> in C that doesn’t share code with the exposed version. You will find that
>> the tokenizer module in the standard library actually behaves differently
>> regarding what tokens are emitted in new lines and indentations.
>>
>> The only way to be sure is check the code unfortunately.
>>
>> Hope this helps.
>>
>> Regards from rainy London,
>> Pablo Galindo Salgado
>>
>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>> >
>> > 
>> > I am writing a Rust version of Python for fun and I am at the parser
>> stage of development.
>> >
>> > I copied and modified a PEG grammar ruleset from another open source
>> project and I've already noticed some problems (ex Newline vs NL) with how
>> they transcribed things.
>> >
>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>> sanity check if that is right before I figure out how to hack in a NEWLINE
>> rule and update my grammar ruleset.
>> > ___
>> > Python-Dev mailing list -- python-dev@python.org
>> > To unsubscribe send an email to python-dev-le...@python.org
>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>> > Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, 
not a parser rule. What decides when NEWLINE is emitted is the lexer that has 
nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the 
one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C 
that doesn’t share code with the exposed version. You will find that the 
tokenizer module in the standard library actually behaves differently regarding 
what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W  wrote:
> 
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of 
> development.
> 
> I copied and modified a PEG grammar ruleset from another open source project 
> and I've already noticed some problems (ex Newline vs NL) with how they 
> transcribed things.
> 
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the 
> parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity 
> check if that is right before I figure out how to hack in a NEWLINE rule and 
> update my grammar ruleset.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YWDKMMKQJN5UY44ONDGF6VD24M7H7HYB/
Code of Conduct: http://python.org/psf/codeofconduct/