On Fri, Jul 29, 2011 at 03:50:44PM -0500, Dan McGee wrote: > On Fri, Jul 29, 2011 at 3:32 PM, Lukas Fleischer > <[email protected]> wrote: > > On Thu, Jul 28, 2011 at 01:59:07PM -0500, Dan McGee wrote: > >> This implements the following scheme: > >> > >> * /packages/cower/ --> /packages/co/cower/ > >> * /packages/j/ --> /packages/j/j/ > >> * /packages/zqy/ --> /packages/zq/y/ > > > > I hope there's a typo in the last example, otherwise I must have > > misunderstood something :) > Yes, typo.
You might want to amend this when resubmitting the patch (details follow) :) > > >> > >> We take up to the first two characters of each package name as a > >> intermediate subdirectory, and then the full package name lives > >> underneath that. > >> > >> Why, you ask? Well because earlier today the AUR hit 32,000 entries in > >> the unsupported/ directory, making new package uploads impossible. While > >> some might argue we shouldn't have so many damn packages in the repos, > >> we should be able to handle this case. > >> > >> Why two characters instead of one? Our two biggest two-char groups, 'pe' > >> and 'py', both start with 'p', and have nearly 2000 packages each. Go > >> Python and Perl. > > > > Time to move to ext4, eh? No, seriously: Something tells me that we > > should neither be filesystem dependant nor depend on the current > > distribution of package names. Using some better hash algorithm might > > fix the second problem while reducing predictability for the end user. > We wouldn't be the first to use a scheme like this, so it felt like > the right choice. See for example: > http://pypi.python.org/packages/source/D/Django/Django-1.3.tar.gz (one > letter only) > http://search.cpan.org/CPAN/authors/id/S/SR/SRI/Mojolicious-1.68.tar.gz > (segmented by author, multiple levels) Yeah, I've seen this before. > > Ext4 came up as an option yesterday, but its best to prevent one from > shooting themselves in the foot like this, and this seems like the > more proper solution. I do share some thoughts that we shouldn't > depend on a certain distribution of package names, but reducing > predictability seemed like a big enough downfall that I didn't want to > go that way, and moving to this scheme greatly increases the time > before we'd hit any problems. If anyone has predictable but scalable > solutions I'm more than open to hearing them. Of course, we do provide > the URL in the JSON request for a reason. Well, one predictable and scalable solution would be to split after every character or after every two characters and create nested directories (as we only allow a subset of all possible file names, that would still result in less than ~32000 subdirectories per directory). This would result in a very inscrutable directory structure tho. As I mentioned before, I'm fine with using the 2-character prefix for now. It feels like the best compromise. We can think about more individualized solutions later. > > > Given that we will probably run into the same again soon (according to > > current statistics, there are about two months left), I will apply this > > temporary workaround and prepare a release soon, though. > Yeah, we were able to free ~1000 "spots" or so, so we have breathing > room, but not loads of time. I did this via the cleanup script if that > wasn't obvious, realize I didn't really say that anywhere. That was kind of obvious, yeah :) Especially since you submitted the cleanup script patch as well. > > >> Still needed is a "move the existing data" script, as well as a set of > >> rewrite rules for those wishing to preserve backward compatible URLs for > >> any helper programs doing the wrong thing and relying on them. > > > > If we provide backward compatible URLs, why not keep them as default? I > > doubt this will affect performance... > It wouldn't- I was more concerned with keeping the AUR deployable as > easily as possible, and not tied to a specific webserver and it's > hairy configuration of rewrite rules. Making them optional prevents us > from having to muck with things at that level. Ack. > > >> Signed-off-by: Dan McGee <[email protected]> > >> --- > >> scripts/cleanup | 24 ++++++++++++++++-------- > >> web/html/pkgsubmit.php | 2 +- > >> web/lib/aurjson.class.php | 2 +- > >> web/template/pkg_details.php | 2 +- > >> 4 files changed, 19 insertions(+), 11 deletions(-) > >>
