In my v4 pool are collisions still generating _0, _1, _2 etc filenames in the pool/ ?

(as in the example from the docs mentions:
        __TOPDIR__/pool/1/2/3/123456789abcdef0
        __TOPDIR__/pool/1/2/3/123456789abcdef0_0
        __TOPDIR__/pool/1/2/3/123456789abcdef0_1
)

I am using compression (I only have cpool/ dir) and I am asking because on both servers running:
        find cpool/ -name "*_0" -print
        find cpool/ -name "*_*" -print

brings zero results.


Thank you.


On 16/3/2023 6:30 μ.μ., backu...@kosowsky.org wrote:
Rob Sheldon wrote at about 08:31:17 -0700 on Thursday, March 16, 2023:
  > On Thu, Mar 16, 2023, at 7:43 AM, backu...@kosowsky.org wrote:
  > >
  > > Rob Sheldon wrote at about 23:54:51 -0700 on Wednesday, March 15, 2023:
  > > > There is no reason to be concerned. This is normal.
  > >
  > > It *should* be extremely, once-in-a-blue-moon, rare to randomly have an
  > > md5sum collision -- as in 1.47*10^-29
  >
  > Why are you assuming this is "randomly" happening? Any time an identical 
file exists in more than one place on the client filesystem, there will be a collision. This 
is common in lots of cases. Desktop environments frequently have duplicated files scattered 
around. I used BackupPC for website backups; my chain length was approximately equal to the 
number of WordPress sites I was hosting.

You are simply not understanding how file de-duplication and pool
chains work in v4.

Identical files contribute only a single chain instance -- no matter
how many clients you are backing up and no matter how many backups you
save of each client. This is what de-duplication does.

The fact that they appear on different clients and/or in different
parts of the filesystem is reflected in the attrib files in the pc
subdirectories for each client. This is where the metadata is stored.

Chain lengths have to do with pool storage of the file contents
(ignoring metadata). Lengths greater than 1 only occur if you have
md5sum hash collisions -- i.e., two files (no matter on what client or
where in the filesystem) with non-identical contents but the same
md5sum hash.

Such collisions are statistically exceedingly unlikely to occur on
normal data where you haven't worked hard to create such collisions.

For example, on my backup server:
        Pool is 841.52+0.00GiB comprising 7395292+0 files and 16512+1 
directories (as of 2023-03-16 01:11),
        Pool hashing gives 0+0 repeated files with longest chain 0+0,

I strongly suggest you read the documentation on BackupPC before
making wildly erroneous assumptions about chains. You can also look at
the code in BackupPC_refCountUpdate which defines how $fileCntRep and
$fileCntRepMax are calculated.

Also, if what you said were true, the OP would have multiple chains -
presumably one for each distinct file that is "scattered around"

If you are using v4.x and have pool hashing with such collisions, it
would be great to see them. I suspect you are either using v3 or you
are using v4 with a legacy v3 pool

  > > You would have to work hard to artificially create such collisions.
  >
  > $ echo 'hello world' > ~/file_a
  > $ cp ~/file_a ~/file_b
  > $ [ "$(cat ~/file_a | md5sum)" = "$(cat ~/file_b | md5sum)" ] && echo 
"MATCH"
  >
  > _</email>_
  > Rob Sheldon
  > Contract software developer, devops, security, technical lead
  >
  >
  > _______________________________________________
  > BackupPC-users mailing list
  > BackupPC-users@lists.sourceforge.net
  > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
  > Wiki:    https://github.com/backuppc/backuppc/wiki
  > Project: https://backuppc.github.io/backuppc/


_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

Reply via email to