[Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Meador Inge Mon, 27 Sep 2010 20:17:43 -0700

Hi All,

I was going through some of the open issues related to 'tokenize' and ran
across 'issue2180'.  The reproduction case for this issue is along the lines
of:


 >>> tokenize.tokenize(io.StringIO("if 1:\n \\\n #hey\n print 1").readline)

but, with 'py3k' I get:

    >>> tokenize.tokenize(io.StringIO("if 1:\n  \\\n  #hey\n  print
1").readline)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 360, in
tokenize
        encoding, consumed = detect_encoding(readline)
      File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 316, in
detect_encoding
        if first.startswith(BOM_UTF8):
    TypeError: Can't convert 'bytes' object to str implicitly

which, as seen in the trace, is because the 'detect_encoding' function in
'Lib/tokenize.py' searches for 'BOM_UTF8' (a 'bytes' object) in the string
to tokenize 'first' (a 'str' object).  It seems to me that strings should
still be able to be tokenized, but maybe I am missing something.

Is the implementation of 'detect_encoding' correct in how it attempts to
determine an encoding or should I open an issue for this?

---
Meador

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Reply via email to