Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-21 Thread Jeff Johnson
On Dec 21, 2010, at 2:26 AM, Anders F Björklund wrote: Jeff Johnson wrote: Should make it into a generic library eventually, once this prototyping is done... Amazing how many silly bitarrays and digests are out there, like using scripted byte arrays and for instance MD5, for Bloom filters.

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-20 Thread Jeff Johnson
On Dec 20, 2010, at 7:01 PM, Anders F Björklund wrote: Jeff Johnson wrote: Should make it into a generic library eventually, once this prototyping is done... Amazing how many silly bitarrays and digests are out there, like using scripted byte arrays and for instance MD5, for Bloom filters.

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Jeff Johnson
On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote: On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: So I guess there's something I'm not really fully grasping here... See code attached... Yes. You miss that you need to estimate the expected size of the population you wish to

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Jeff Johnson
The bisection likely isn't worth worrying about until there is need. But rpmbdUnion/rpmbfIntersect are useful operations on arrays of fixed size Bloom filters no matter what. One last hint I forgot (re using rpmbfIntersect) Assuming that all of the Bloom filters are fixed size, then one

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Per Øyvind Karlsen
2010/12/18 Jeff Johnson n3...@mac.com: On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote: On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: So I guess there's something I'm not really fully grasping here... See code attached... Yes. You miss that you need to estimate the expected

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-17 Thread Per Øyvind Karlsen
2010/12/15 Jeff Johnson n3...@mac.com: On Dec 14, 2010, at 9:51 PM, Jeff Johnson wrote: Download. uncompress. use for file dependencies. I will take wagers on how much smaller the encoding is as soon as you tell me what you choose for {n,p}. There's an obvious generalization here for

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-17 Thread Jeff Johnson
On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: So I guess there's something I'm not really fully grasping here... See code attached... Yes. You miss that you need to estimate the expected size of the population you wish to capture in a Bloom Filter: size_t n = 0; /*

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-15 Thread Anders F Björklund
Jeff Johnson wrote: I was recently looking at making a manifest for FreeBSD, which consists of a simple files listing for *each package*. ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-8.1-release/All/*.tbz I was looking at the Slackware MANIFEST as a reference, which is just a

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote: On a related note though I've started giving parentdir symlink deps some more thoughts again though, skimming the surface on practical issues and drawbacks of such as ie. the size of files.xml.lzma in main/release currently being

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Per Øyvind Karlsen
2010/12/14 Jeff Johnson n3...@mac.com: On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote: On a related note though I've started giving parentdir symlink deps some more thoughts again though, skimming the surface on practical issues and drawbacks of such as ie. the size of

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: The issues of the size of files.xml* and synthesis.hdlist* have nothing whatsoever to do with parentdir/linkto dependencies. But for being able to resolve these dependencies, one still needs the metadata of files.xml, which

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Anders F Björklund
Jeff Johnson wrote: There are some very simple data reductions on hierarchical paths too. One of the best known is Run a dictionary: assign an integer weighted by # of occurences to favor small integers for frequently encountered tokens between /.../ (all of usr and bin

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 6:46 PM, Anders F Björklund wrote: Jeff Johnson wrote: There are some very simple data reductions on hierarchical paths too. One of the best known is Run a dictionary: assign an integer weighted by # of occurences to favor small integers for frequently

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
Google said http://techreports.lib.berkeley.edu/accessPages/CSD-83-148.html Finding Files Fast Authors: Woods, James A. Technical Report Identifier: CSD-83-148 January 15, 1983 Bingo. Off by a year, and the chloroxed neurons resisted confusion with the other James Wood hash

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Per Øyvind Karlsen
2010/12/14 Jeff Johnson n3...@mac.com: On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: The issues of the size of files.xml* and synthesis.hdlist* have nothing whatsoever to do with parentdir/linkto dependencies. But for being able to resolve these dependencies, one still needs the

Re: Metadata size constraints wrt. parentdir symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 9:23 PM, Per Øyvind Karlsen wrote: 2010/12/14 Jeff Johnson n3...@mac.com: On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: The issues of the size of files.xml* and synthesis.hdlist* have nothing whatsoever to do with parentdir/linkto dependencies. But for