Giannis Economou wrote at about 19:12:56 +0200 on Thursday, March 16, 2023: > In my v4 pool are collisions still generating _0, _1, _2 etc filenames > in the pool/ ?
According to the code in Lib.pm, it appears that unlike v3, there is no underscore -- it's just an (unsigned) long added to the end of the 16 byte digest. > > (as in the example from the docs mentions: > __TOPDIR__/pool/1/2/3/123456789abcdef0 > __TOPDIR__/pool/1/2/3/123456789abcdef0_0 > __TOPDIR__/pool/1/2/3/123456789abcdef0_1 > ) That is for v3 as indicated by the 3-layer pool. > > I am using compression (I only have cpool/ dir) and I am asking because > on both servers running: > find cpool/ -name "*_0" -print > find cpool/ -name "*_*" -print > > brings zero results. Try: find /var/lib/backuppc/cpool/ -type f -regextype grep ! -regex ".*/[0-9a-f]\{32\}" ! -name "LOCK" ! -name "poolCnt" > > > Thank you. > > > On 16/3/2023 6:30 μ.μ., backu...@kosowsky.org wrote: > > Rob Sheldon wrote at about 08:31:17 -0700 on Thursday, March 16, 2023: > > > On Thu, Mar 16, 2023, at 7:43 AM, backu...@kosowsky.org wrote: > > > > > > > > Rob Sheldon wrote at about 23:54:51 -0700 on Wednesday, March 15, > > 2023: > > > > > There is no reason to be concerned. This is normal. > > > > > > > > It *should* be extremely, once-in-a-blue-moon, rare to randomly have > > an > > > > md5sum collision -- as in 1.47*10^-29 > > > > > > Why are you assuming this is "randomly" happening? Any time an > > identical file exists in more than one place on the client filesystem, > > there will be a collision. This is common in lots of cases. Desktop > > environments frequently have duplicated files scattered around. I used > > BackupPC for website backups; my chain length was approximately equal to > > the number of WordPress sites I was hosting. > > > > You are simply not understanding how file de-duplication and pool > > chains work in v4. > > > > Identical files contribute only a single chain instance -- no matter > > how many clients you are backing up and no matter how many backups you > > save of each client. This is what de-duplication does. > > > > The fact that they appear on different clients and/or in different > > parts of the filesystem is reflected in the attrib files in the pc > > subdirectories for each client. This is where the metadata is stored. > > > > Chain lengths have to do with pool storage of the file contents > > (ignoring metadata). Lengths greater than 1 only occur if you have > > md5sum hash collisions -- i.e., two files (no matter on what client or > > where in the filesystem) with non-identical contents but the same > > md5sum hash. > > > > Such collisions are statistically exceedingly unlikely to occur on > > normal data where you haven't worked hard to create such collisions. > > > > For example, on my backup server: > > Pool is 841.52+0.00GiB comprising 7395292+0 files and 16512+1 > > directories (as of 2023-03-16 01:11), > > Pool hashing gives 0+0 repeated files with longest chain 0+0, > > > > I strongly suggest you read the documentation on BackupPC before > > making wildly erroneous assumptions about chains. You can also look at > > the code in BackupPC_refCountUpdate which defines how $fileCntRep and > > $fileCntRepMax are calculated. > > > > Also, if what you said were true, the OP would have multiple chains - > > presumably one for each distinct file that is "scattered around" > > > > If you are using v4.x and have pool hashing with such collisions, it > > would be great to see them. I suspect you are either using v3 or you > > are using v4 with a legacy v3 pool > > > > > > You would have to work hard to artificially create such collisions. > > > > > > $ echo 'hello world' > ~/file_a > > > $ cp ~/file_a ~/file_b > > > $ [ "$(cat ~/file_a | md5sum)" = "$(cat ~/file_b | md5sum)" ] && echo > > "MATCH" > > > > > > _</email>_ > > > Rob Sheldon > > > Contract software developer, devops, security, technical lead > > > > > > > > > _______________________________________________ > > > BackupPC-users mailing list > > > BackupPC-users@lists.sourceforge.net > > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > > > Wiki: https://github.com/backuppc/backuppc/wiki > > > Project: https://backuppc.github.io/backuppc/ > > > > > > _______________________________________________ > > BackupPC-users mailing list > > BackupPC-users@lists.sourceforge.net > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > > Wiki: https://github.com/backuppc/backuppc/wiki > > Project: https://backuppc.github.io/backuppc/ > > > _______________________________________________ > BackupPC-users mailing list > BackupPC-users@lists.sourceforge.net > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: https://github.com/backuppc/backuppc/wiki > Project: https://backuppc.github.io/backuppc/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: https://github.com/backuppc/backuppc/wiki Project: https://backuppc.github.io/backuppc/