On 1/8/23 09:22, Eduard Bloch wrote:
> 
> Thanks. Regarding the second patch, I am not sure. Looks like my C++
> also has become rusty in the last months. It would be good to have an
> explicit description of the problem cases, since I don't know exactly
> what you mean with "streaming". I.e. it seems like you want the generic
> parser implementation to be changed to a specialized local one but how
> am I supposed to write a unit test for this?
> 
> To be checked in the next days, stay tuned.
> 
> Best regards,
> Eduard.

Sorry, I was in a bit of a hurry when I wrote that email.

The purpose of the second patch is to properly parse the list of packages
sources, e.g. [1].  Those files are concatenations of RFC822 blocks,
separated by blank lines.  The previous implementation did not work,
because it did not account for the fact that all of these separate blocks
are of importance to us.

In particular, ParseDebianRfc822Index, line 1995 of src/cacheman.cc:

    // we don't merge

justifies the next statement:

    pLastVal->clear();  

That serves to remove the last block's information (rather than merge it
with the last package).  The symptom of this error in parsing is that
all Sources blocks (except the last one, which is not clobbered by any
subsequent blocks) are not identified as referencing data in the cache.
Therefore most of the, e.g., .tar{,.gz,.xz,.bzip2} and .dsc files are
marked for expiration from the cache.

I have added a new implementation in the same spirit as the one used for
Packages parsing, (see src/cacheman.cc:1707).  In that branch of the
switch statement (EIDX_PACKAGES), each RFC822 block is parsed.  I used
the word "streaming" to indicate that the whole index does not need to
be loaded into memory before the first call to ret(info) begins returning
data to the rest of the service.

While good for the overall performance of apt-cacher-ng, this streaming
behavior is not the primary purpose of the patch (and I apologize for
overemphasizing that in the title).  Indeed, it is the correctness of
the result that is my main objective here.

As for writing a unit test, I would suggest grabbing some subset of [1],
and ensuring that all of the entries are properly accounted for. Without
this patch, such a unit test would fail.

Best,
Antonio

[1] http://ftp.us.debian.org/debian/dists/bookworm/main/source/Sources.gz

Attachment: OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to