Giannis Economou wrote at about 19:12:56 +0200 on Thursday, March 16, 2023:
 > In my v4 pool are collisions still generating _0, _1, _2 etc filenames 
 > in the pool/ ?

According to the code in Lib.pm, it appears that unlike v3, there is
no underscore -- it's just an (unsigned) long added to the end of the
16 byte digest.

 > 
 > (as in the example from the docs mentions:
 >          __TOPDIR__/pool/1/2/3/123456789abcdef0
 >          __TOPDIR__/pool/1/2/3/123456789abcdef0_0
 >          __TOPDIR__/pool/1/2/3/123456789abcdef0_1
 > )

That is for v3 as indicated by the 3-layer pool.

 > 
 > I am using compression (I only have cpool/ dir) and I am asking because 
 > on both servers running:
 >          find cpool/ -name "*_0" -print
 >          find cpool/ -name "*_*" -print
 > 
 > brings zero results.

Try:

    find /var/lib/backuppc/cpool/ -type f -regextype grep ! -regex 
".*/[0-9a-f]\{32\}" ! -name "LOCK" ! -name "poolCnt"

 > 
 > 
 > Thank you.
 > 
 > 
 > On 16/3/2023 6:30 μ.μ., backu...@kosowsky.org wrote:
 > > Rob Sheldon wrote at about 08:31:17 -0700 on Thursday, March 16, 2023:
 > >   > On Thu, Mar 16, 2023, at 7:43 AM, backu...@kosowsky.org wrote:
 > >   > >
 > >   > > Rob Sheldon wrote at about 23:54:51 -0700 on Wednesday, March 15, 
 > > 2023:
 > >   > > > There is no reason to be concerned. This is normal.
 > >   > >
 > >   > > It *should* be extremely, once-in-a-blue-moon, rare to randomly have 
 > > an
 > >   > > md5sum collision -- as in 1.47*10^-29
 > >   >
 > >   > Why are you assuming this is "randomly" happening? Any time an 
 > > identical file exists in more than one place on the client filesystem, 
 > > there will be a collision. This is common in lots of cases. Desktop 
 > > environments frequently have duplicated files scattered around. I used 
 > > BackupPC for website backups; my chain length was approximately equal to 
 > > the number of WordPress sites I was hosting.
 > >
 > > You are simply not understanding how file de-duplication and pool
 > > chains work in v4.
 > >
 > > Identical files contribute only a single chain instance -- no matter
 > > how many clients you are backing up and no matter how many backups you
 > > save of each client. This is what de-duplication does.
 > >
 > > The fact that they appear on different clients and/or in different
 > > parts of the filesystem is reflected in the attrib files in the pc
 > > subdirectories for each client. This is where the metadata is stored.
 > >
 > > Chain lengths have to do with pool storage of the file contents
 > > (ignoring metadata). Lengths greater than 1 only occur if you have
 > > md5sum hash collisions -- i.e., two files (no matter on what client or
 > > where in the filesystem) with non-identical contents but the same
 > > md5sum hash.
 > >
 > > Such collisions are statistically exceedingly unlikely to occur on
 > > normal data where you haven't worked hard to create such collisions.
 > >
 > > For example, on my backup server:
 > >    Pool is 841.52+0.00GiB comprising 7395292+0 files and 16512+1 
 > > directories (as of 2023-03-16 01:11),
 > >    Pool hashing gives 0+0 repeated files with longest chain 0+0,
 > >
 > > I strongly suggest you read the documentation on BackupPC before
 > > making wildly erroneous assumptions about chains. You can also look at
 > > the code in BackupPC_refCountUpdate which defines how $fileCntRep and
 > > $fileCntRepMax are calculated.
 > >
 > > Also, if what you said were true, the OP would have multiple chains -
 > > presumably one for each distinct file that is "scattered around"
 > >
 > > If you are using v4.x and have pool hashing with such collisions, it
 > > would be great to see them. I suspect you are either using v3 or you
 > > are using v4 with a legacy v3 pool
 > >
 > >   > > You would have to work hard to artificially create such collisions.
 > >   >
 > >   > $ echo 'hello world' > ~/file_a
 > >   > $ cp ~/file_a ~/file_b
 > >   > $ [ "$(cat ~/file_a | md5sum)" = "$(cat ~/file_b | md5sum)" ] && echo 
 > > "MATCH"
 > >   >
 > >   > _</email>_
 > >   > Rob Sheldon
 > >   > Contract software developer, devops, security, technical lead
 > >   >
 > >   >
 > >   > _______________________________________________
 > >   > BackupPC-users mailing list
 > >   > BackupPC-users@lists.sourceforge.net
 > >   > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > >   > Wiki:    https://github.com/backuppc/backuppc/wiki
 > >   > Project: https://backuppc.github.io/backuppc/
 > >
 > >
 > > _______________________________________________
 > > BackupPC-users mailing list
 > > BackupPC-users@lists.sourceforge.net
 > > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > > Wiki:    https://github.com/backuppc/backuppc/wiki
 > > Project: https://backuppc.github.io/backuppc/
 > 
 > 
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    https://github.com/backuppc/backuppc/wiki
 > Project: https://backuppc.github.io/backuppc/


_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

Reply via email to