On Sun, 6 Jul 2003, Andrew Suffield wrote: > On ext2, as an example, stat()ting or open()ing a directory of 10000 > files in the order returned by readdir() will be vastly quicker than > in some other sequence (like, say, bytewise lexicographic) due to the > way in which the filesystem looks up inodes. This has caused > significant performance issues for bugs.debian.org in the past.
You're right, I didn't get the point in the story when I simply ran find using the sortdir wrapper, but now I understand the problem. However I'm still unsure if this good to keep files unsorted, especially if we consider effective syncing of packages. On my home computer I've never heard the sound of my disk at package creating phase (even though we've beein using sortdir for more than a half year, and I've compiled hundreds of packages), but I hear it when e.g. the source is decompressed. At the 'dpkg-deb --build' phase only the processor is the bottleneck. This might vary under different circumstances. I'm unaware of them in case of Debian, e.g. I have no information about what hardware your packages are created on, whether there are any other cpu-intensive or disk-intensive applications running on these machines etc. I can easily imagine that using sortdir can drastically decrease performance if another disk-intensive process is running. However my experiences didn't show a noticeable performance decrease if this was the only process accessing the disk... But hey, let's stop for a minute :-) Building the package only uses the memory cache for most of the packages, doesn't it? The files it packs together have just recently been created and there are not so many packages whose uncompressed size is close to or bigger than the amount of RAM in today's machines... And for the large packages the build itself might take thousands as much time as reading the files in sorted order. Does anyone know what RPM does? I know that listing the contents of a package always produces alphabetical order but I don't know whether the filelist is sorted on the fly or the files really appear alphabetically in the cpio archive. So I guess we've already seen pros and cons of sorting the files. (One thing is missing: we still don't know how efficient rsync is if two rsyncable tar.gz files contain the same files but in different order.) The decision is clearly not mine but the Debian developers'. However, if you ask me, I still vote for sorting the files :-)) bye, Egmont