On 29.03.2021 22:14, Claudius Heine wrote: > Hi Andrei, > > On 2021-03-29 18:30, Andrei Borzenkov wrote: >> On 29.03.2021 16:16, Claudius Heine wrote: >>> Hi, >>> >>> I am currently investigating the possibility to use `btrfs-stream` files >>> (generated by `btrfs send`) for deploying a image based update to >>> systems (probably embedded ones). >>> >>> One of the issues I encountered here is that btrfs-send does not use any >>> diff algorithm on files that have changed from one snapshot to the next. >>> >> >> btrfs send works on block level. It sends blocks that differ between two >> snapshots. > > Are you sure? >
Yes. > I did a test with a 32MiB random file. I created one snapshot, then > changed (not deleted or added) one byte in that file and then created a > snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it > would be only block based, then I would have expected that it would just > contain the changed block, not the whole file. And if I use a smaller > file on the same file system, then the `btrfs-stream` is smaller as well. > > I looked into those `btrfs-stream` files using [1] and also [2] as well > as the code. While I haven't understood everything there yet, it > currently looks to me like it is file based. > btrfs send is not pure block based image, because it would require two absolutely identical filesystems. It needs to replicate filesystem structure so it of course needs to know which files are created/deleted. But for each file it only sends changed parts since previous snapshot. This only works if both snapshots refer to the *same* file. As was already mentioned, you need to understand how your files are changed. In particular, standard tools for software update do not rewrite files in place - they create new files with new content. From btrfs perspective they are completely different; two files with the same name in two snapshots do not share a single byte. When you compute delta between two snapshots you get instructions to delete old file and create new file with new content (that will be renamed to the same name as deleted old file). This also by necessity sends full new content. So yes, btrfs replication is block based; similarity is determined by how much physical data is shared between two files. And you expect file based replication where file names determine whether files should be considered the same and changes are computed for two files with the same name. >> >>> One way to implement this would be to add some sort of 'patch' command >>> to the `btrfs-stream` format. >>> >> >> This would require reading complete content of both snapshots instead if >> just computing block diff using metadata. Unless I misunderstand what >> you mean. > I think I should only need access to the old snapshot as well as the > `btrfs-stream` file. But I currently don't have a complete PoC of this > ready. > > regards, > Claudius > > [1] https://github.com/sysnux/btrfs-snapshots-diff > [2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive