On 1/8/23 09:22, Eduard Bloch wrote: > > Thanks. Regarding the second patch, I am not sure. Looks like my C++ > also has become rusty in the last months. It would be good to have an > explicit description of the problem cases, since I don't know exactly > what you mean with "streaming". I.e. it seems like you want the generic > parser implementation to be changed to a specialized local one but how > am I supposed to write a unit test for this? > > To be checked in the next days, stay tuned. > > Best regards, > Eduard.
Sorry, I was in a bit of a hurry when I wrote that email. The purpose of the second patch is to properly parse the list of packages sources, e.g. [1]. Those files are concatenations of RFC822 blocks, separated by blank lines. The previous implementation did not work, because it did not account for the fact that all of these separate blocks are of importance to us. In particular, ParseDebianRfc822Index, line 1995 of src/cacheman.cc: // we don't merge justifies the next statement: pLastVal->clear(); That serves to remove the last block's information (rather than merge it with the last package). The symptom of this error in parsing is that all Sources blocks (except the last one, which is not clobbered by any subsequent blocks) are not identified as referencing data in the cache. Therefore most of the, e.g., .tar{,.gz,.xz,.bzip2} and .dsc files are marked for expiration from the cache. I have added a new implementation in the same spirit as the one used for Packages parsing, (see src/cacheman.cc:1707). In that branch of the switch statement (EIDX_PACKAGES), each RFC822 block is parsed. I used the word "streaming" to indicate that the whole index does not need to be loaded into memory before the first call to ret(info) begins returning data to the rest of the service. While good for the overall performance of apt-cacher-ng, this streaming behavior is not the primary purpose of the patch (and I apologize for overemphasizing that in the title). Indeed, it is the correctness of the result that is my main objective here. As for writing a unit test, I would suggest grabbing some subset of [1], and ensuring that all of the entries are properly accounted for. Without this patch, such a unit test would fail. Best, Antonio [1] http://ftp.us.debian.org/debian/dists/bookworm/main/source/Sources.gz
OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature