Hello, I seem to be experiencing a problem with collisions on some data that is being backed up from a server. The problem didn't arise till a user started populating a directory with thousands of bitmaps files. It now appears that all of these files are causing collisions and causing the backups to slow down to a crawl. Here's the information regarding the pool:
General Server Information * The servers PID is 3626, on host tapehost1, version 3.2.0beta0, started at 6/12 14:42. * This status was generated at 7/1 09:54. * The configuration was last loaded at 6/12 14:52. * PCs will be next queued at 7/1 10:00. * Other info: o 2 pending backup requests from last scheduled wakeup, o 0 pending user backup requests, o 10 pending command requests, o Uncompressed pool: + Pool is 687.39GB comprising 1154808 files and 1279 directories (as of 6/30 07:09), + Pool hashing gives 5013 repeated files with longest chain 4527, + Nightly cleanup removed 228 files of size 0.22GB (around 6/30 07:09), o Compressed pool: + Pool is 438.99GB comprising 1658761 files and 2184 directories (as of 6/30 15:23), + Pool hashing gives 1611 repeated files with longest chain 776, + Nightly cleanup removed 34 files of size 0.00GB (around 6/30 15:23), o Pool file system was recently at 61% (7/1 09:51), today's max is 61% (7/1 01:00) and yesterday's max was 61%. Notice that the longest chain in the uncompressed pool is 4527. If I drill down to the location where the collisions happen I have: # cd /ldisk/3ware0/backups/pool/e/f/4/ # ls -lah ef48707c04eed19414d0d42da047ea3f_0 ef48707c04eed19414d0d42da047ea3f_4526 -rw-r----- 2 backuppc backuppc 15M 2009-06-12 11:37 ef48707c04eed19414d0d42da047ea3f_0 -rw-r----- 2 backuppc backuppc 15M 2009-06-26 12:30 ef48707c04eed19414d0d42da047ea3f_4526 All the files between are the same size as well. It appears that the BackupPC_dump instance for this server takes forever comparing these files. It appears to loop over and over for each file and this can take up to 2-3 days for these 4527 files. Here's an strace of what the process is doing. # ps aux | grep BackupPC_dump backuppc 717 14.0 2.6 260352 215796 ? D Jun28 636:51 /usr/bin/perl /usr/local/backuppc/bin/BackupPC_dump services # strace -p 717 stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496", {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0 open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496", O_RDONLY) = 6 ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY (Inappropriate ioctl for device) lseek(6, 0, SEEK_CUR) = 0 fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 lseek(7, 0, SEEK_SET) = 0 read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"..., 1048576) = 1048576 read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"..., 1048576) = 1048576 read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"..., 1048576) = 1048576 close(6) = 0 stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497", {st_mode=S_ open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497", O_RDONLY) = ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY (Inappropriate ioctl f lseek(6, 0, SEEK_CUR) = 0 fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 lseek(7, 0, SEEK_SET) = 0 read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"..., 1048576) = 1048576 read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"..., 1048576) = 1048576 read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"..., 1048576) = 1048576 close(6) = 0 Here's an iostat dump for the filesystem the data is being backed up to: % iostat -m 2 /dev/sdb Linux 2.6.27.13-smp (tapehost1) 07/01/2009 avg-cpu: %user %nice %system %iowait %steal %idle 0.94 0.06 2.00 11.81 0.00 85.44 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdb 1557.00 90.95 0.05 181 0 avg-cpu: %user %nice %system %iowait %steal %idle 0.87 0.19 2.18 11.19 0.00 85.20 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdb 1750.75 102.06 0.05 205 0 avg-cpu: %user %nice %system %iowait %steal %idle 1.12 0.00 2.19 11.62 0.00 85.31 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdb 1600.50 92.55 0.06 185 0 As you can see, the device the filesystem sits upon is being pushed hard and most likely to the limits. So my questions are: 1. Is there a way to change the hashing algorithm to prevent these massive collisions? 2. If not, are there any other ways to speed up this process so I can get these backups finished in a more timely fashion? The backup of this system used to finish in 4-5 hours for a full now it takes 3+ days for an incremental. -- James Esslinger -- slin...@arlut.utexas.edu System Administrator -- Office: 512.835.3257 SISL/ARL:UT -- Helpdesk: 512.490.4490 ------------------------------------------------------------------------------ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/