Hi Yaroslav, A while ago, you reported a bug against python-debian's handling of comments in deb822.
In that bug, the following input was used to illustrate the problem: > $> cat confuse.txt > Goodone: value0 > > ; xxx might be unicode: Ярик > > Entry: value [...] > or may be comments shouldn't be detached from paragraphs according to > rfc822? (in any case the divergence between unicode/plain handling is > sub-optimal) The only place where comments are defined in deb822 files that I can find is in policy §5.1 and they are only defined for debian/control files and comments are started by a # at the start of the line. The difference you observed due to changing the encoding is actually due to changing which parser was used inside python-debian -- when passed a real filehandle, iter_paragraphs currently tries to use apt's TagFile parser. When parsing the input with an encoding, a real filehandle is not provided to iter_paragraphs and so the internal parser is used instead. If use_apt_pkg=False is added to the call to iter_paragraphs or if the file is first slurped in with readlines(), then the internal parser is used. The difference here is not really unicode/plain handling but whether a filehandle was passed or not. I'm coming to the conclusion that use_apt_pkg=False should be the default for a variety of reasons -- the remaining question is then whether the detached and syntactically invalid comment should cause the parser to bail out or somehow continue past it. Currently, the parser sees this as a null paragraph which (incorrectly?) triggers its end of iteration condition. cheers Stuart -- Stuart Prescott http://www.nanonanonano.net/ stu...@nanonanonano.net Debian Developer http://www.debian.org/ stu...@debian.org GPG fingerprint 90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org