Hi Yaroslav,

A while ago, you reported a bug against python-debian's handling of comments 
in deb822.

In that bug, the following input was used to illustrate the problem:

> $> cat confuse.txt 
> Goodone: value0
> 
>  ; xxx might be unicode: Ярик
> 
> Entry: value

[...]

> or may be comments shouldn't be detached from paragraphs according to
> rfc822? (in any case the divergence between unicode/plain handling is
> sub-optimal)

The only place where comments are defined in deb822 files that I can find is in 
policy §5.1 and they are only defined for debian/control files and comments are 
started by a # at the start of the line.

The difference you observed due to changing the encoding is actually due to 
changing which parser was used inside python-debian -- when passed a real 
filehandle, iter_paragraphs currently tries to use apt's TagFile parser. When 
parsing the input with an encoding, a real filehandle is not provided to 
iter_paragraphs and so the internal parser is used instead. If 
use_apt_pkg=False is added to the call to iter_paragraphs or if the file is 
first slurped in with readlines(), then the internal parser is used. The 
difference here is not really unicode/plain handling but whether a filehandle 
was passed or not.

I'm coming to the conclusion that use_apt_pkg=False should be the default for 
a variety of reasons -- the remaining question is then whether the detached 
and syntactically invalid comment should cause the parser to bail out or 
somehow continue past it. Currently, the parser sees this as a null paragraph 
which (incorrectly?) triggers its end of iteration condition.

cheers
Stuart


-- 
Stuart Prescott    http://www.nanonanonano.net/   stu...@nanonanonano.net
Debian Developer   http://www.debian.org/         stu...@debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to