Various parts of BackupPC spend a lot of time traversing large trees of files, including BackupPC_dump, BackupPC_trashClean and BackupPC_nightly.
As many people have observed, over time BackupPC's pooling results in directories with files that are widely dispersed across the disk. This makes disk seeks the performance bottleneck. Currently BackupPC processes all the files in a directory by reading the directory and processing each file in the order returned by the directory read. Simon Strack from Monash U noted a couple of years ago that disk seeks can be reduced significantly by sorting the directory read results by inode. If numeric inode is closely correlated with the disk position (ie: block) then the files are processed in an order that reduces disk seeks. Perl's built-in directory reading functions just return a file name. The perl module IO::Dirent additionally returns the inode and file type, which avoids a stat() on each file. I'm interested in exploring whether IO::Dirent works with different operating and file systems and, if so, whether traversing those file systems by sorting inodes returned by IO::Dirent provides any benefit. I am asking for some volunteers to do the following: - install IO::Dirent from CPAN. - unpack the attached tar file in a directory - make sure IO::Dirent works (ie: returns correct type and inode information) on the file system you will test by running the inodeVerify script: su backuppc mkdir TOPDIR/temp cd TOPDIR/temp inodeVerify It should print "IO::Dirent is ok". You can remove the temp directory. - run the inodeTest benchmark on a large directory tree (eg: /data/BackupPC/cpool or /data/BackupPC/cpool/0 or /data/BackupPC/cpool/[0-7]). You need a large enough tree to render caching unimportant, eg: to do the entire pool: su backuppc inodeTest TOPDIR/cpool or one of these (1/16 of the pool, 1/4 of the pool or 1/2 of the pool respectively): inodeTest TOPDIR/cpool/0 inodeTest TOPDIR/cpool/[0-3] inodeTest TOPDIR/cpool/[0-7] The benchmark traverses the tree and stats each file, first without inode sorting, and then with inode sorting. The pair of tests is repeated 3 times, and the first pair is ignored to reduce the measurement error due to caching, which tends to benefit the second and subsequent runs. If the run time on the last 4-5 runs is way shorter than the first then caching is dominating and you need to re-run with a larger tree. The ratio of elapsed time taken for the two non-sorted runs to the two sorted runs is printed. You should make sure the load from other usage on the file system is low, or at least relatively constant, during the test - otherwise the results won't be meaningful. I'd like to get the following info from you: the output from the two scripts, the OS, the file system type and raid or lvm setup. Please email info to me off list and I will summarize. Craig
inode.tgz
Description: GNU Unix tar archive
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/