-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 John Myers wrote: > > I designed a system where it took feedback from consenting users, sending the > file lists back to my server, were I was going to do some data crunching. The > data from just _my_ system was over 60 MB.
It sounds like you really only need to index each package a few times at most. Sure, the raw data from a user could be 60MB each, but there are some ways to reduce that significantly: 1. Don't send in data for anything in the base system install. 2. As you populate your database, publish a list of indexed packages via a URL. Users would exclude any packages you've already indexed. If this were a GLEP you could probably put the file in the portage directory and everybody would get it via rsync. 3. Start by only indexing each package ONCE. Don't worry about every combo of arches, CFLAGS, USE, etc. That means that most users wouldn't upload anything at all, and the rest would only send their unique contributions. If you get everything working without indexing by USE, you could start adding that capability in. Publish in #2 the list of USE flags indexed for each package, and individuals would only upload packages compiled with something that wasn't on that list. Sure, the final database could easily be 100MB or so, but if you just put it on a website you won't be sending the whole thing. Just put it in mysql/postgres and build a php front end (sorry, not a web dev personally, but it isn't that hard to do from the little I've messed with it). Sorry - I don't intend to make it sound like the whole thing can be done in 5 minutes, and I"m sure you've already poured hours into your effort. However, I don't see any theoretical issues with it as long as the design is right. The important thing is that users are only uploading diffs against your master repository - and not doing a complete dump of their entire system. Otherwise you will get buried in data! I must admit that it is easy to just talk about ideas like this - I really do want to commend you on the work you've undoubtedly already accomplished! OSS projects require lots of hard work by many volunteers and it is all too easy for people like me to just sit back and nitpick what could be done better... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDXlvkg2bN8aFizRkRArU+AKCnEBdpoO2Acnwh3+FFR8CYj5CLtACcCboB 2QIb31yXVdW0EQST8PEUPeY= =VF5P -----END PGP SIGNATURE-----
smime.p7s
Description: S/MIME Cryptographic Signature
