Re: ENB: A much better untokenizer

Edward K. Ream Sun, 03 Nov 2019 05:08:08 -0800

On Sunday, November 3, 2019 at 3:20:27 AM UTC-6, Edward K. Ream wrote:

> The new code should put an end to a long series of issues 
<https://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text=untokenize&submit=search&status=-1%2C1%2C2%2C3>
 
against untokenize code in python's tokenize 
<https://github.com/python/cpython/blob/master/Lib/tokenize.py>library 
module.


A few more words about the testing that I have done.  

The python tests in test_tokenize.py are quite rightly careful about 
unicode. My test code takes similar care.

leoTest.leo contains unit tests from test_tokenize.py, adapted to run 
within Leo. leoTest.leo contains this new unit test:

# Test https://bugs.python.org/issue38663.
import leo.core.leoBeautify as leoBeautify
check_roundtrip = leoBeautify.check_roundtrip

check_roundtrip(
    "print \\\n"
    "    ('abc')\n",
    expect_failure = True
)

Something similar should be added to test_tokenize.py.

Here is the testing code from leoBeautify.py.  As you will see, the code 
runs *stricter* tests than those in test_tokenize.py:

import unittest

def check_roundtrip(f, expect_failure=False):
    """
    Called from unit tests in unitTest.leo.
    
    Test python's tokenize.untokenize method and Leo's Untokenize class.
    """
    check_python_roundtrip(f, expect_failure)
    check_leo_roundtrip(f)
    
def check_leo_roundtrip(code, trace=False):
    """Check Leo's Untokenize class"""
    # pylint: disable=import-self
    import leo.core.leoBeautify as leoBeautify
    assert isinstance(code, str), repr(code)
    tokens = tokenize.tokenize(io.BytesIO(code.encode('utf-8')).readline)
    u = leoBeautify.Untokenize(code, trace=trace)
    results = u.untokenize(tokens)
    unittest.TestCase().assertEqual(code, results)
    
def check_python_roundtrip(f, expect_failure):
    """
    This is tokenize.TestRoundtrip.check_roundtrip, without the wretched 
fudges.
    """
    if isinstance(f, str):
        code = f.encode('utf-8')
    else:
        code = f.read()
        f.close()
    readline = iter(code.splitlines(keepends=True)).__next__
    tokens = list(tokenize.tokenize(readline))
    bytes = tokenize.untokenize(tokens)
    readline5 = iter(bytes.splitlines(keepends=True)).__next__
    result_tokens = list(tokenize.tokenize(readline5))
    if expect_failure:
        unittest.TestCase().assertNotEqual(result_tokens, tokens)
    else:
        unittest.TestCase().assertEqual(result_tokens, tokens)

*Summary*

I've done the heavy lifting on issue 38663 
<https://bugs.python.org/issue38663>. Python devs should handle the details 
of testing and packaging.

Leo's tokenizing code in leoBeautify.py can use the new code immediately, 
without waiting for python to improve tokenize.untokenize.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/f27a1107-f31b-4153-9bbd-195cc2762e1b%40googlegroups.com.

Re: ENB: A much better untokenizer

Reply via email to