On 15. 6. 2026 15:31, Jordan Peck via dev wrote:
[...]
The required probing is too expensive to run in every scope where it
would be needed.

For example, probing on Windows calls GetVolumeInformation().  Probing
on other OSes has to create a file to check whether copy-on-write is
supported.

The patch limits this probing only for larger scopes, such as attaching it
to an update editor instance, but smaller-scoped operations like workqueue
installs or file reverts are left without probing.  If not all code paths
use the copy-on-write path, the size reduction may not be sustainable or
predictable in the long run.
I only added the probing so that checkouts can still benefit from
"streamy checkouts" when reflink support is absent. The probing isn't
that slow, but it's also not really needed for the smaller operations
that still use the workqueue installs,


How do you figure that it's not really needed? Working copies are typically used for a long time, with changes coming in from updates and commits, not just checkouts. Pristine files are also not automagically deleted when they're not referenced, you need 'svn cleanup --vacuum-pristines' for that. (Except when pristines are explicitly omitted, then they're created only for the scope of the operation that needs them, IIRC.)


[...]
Thoughts on creating a branch for this? It may be easier to work through
actual working code, as well as test interaction with other working copy
changes.

Downside is that the code may just bitrot on a branch that's not
actively maintained.
I don't think it needs its own branch, it would be easy to add a
global toggle via a pre-processor define since it's already gated
around the reflink support probing.


I'm well aware that we can #ifdef our way around any new code, but this change as it stands now is much too big and intrusive for that. I would not support committing it directly to trunk in its current state, even if disabled by default. This isn't a comment about your patch specifically, I'd say the same about any other large, intrusive change, no matter the source. We do develop some significant new features on trunk (e.g., FSX, svnxx, svnbrowse), but they're well isolated from the rest of the code – and also never finished, eh. What can go directly to trunk and what should be developed on a branch is a bit of a judgement call. I think my opinion on the matter is consistent.

-- Brane


Thanks,
Jordan



On Wed, 10 Jun 2026 at 15:41, Jordan Peck<[email protected]> wrote:
Hi all,

I've spent time cutting the patch down to the minimal changes needed to get the 
feature working while maintaining a sensible integration (not hacking it in). 
The majority of the patch size now comes from the new io_copy_temp.c file which 
holds all the native platform functions for probing/setting up file reflinks.

I've compiled and run the tests on Windows, Linux and MacOS this time.

Hopefully this is a reviewable size now.
Let me know what you think and any questions.

Thanks,
Jordan


On Fri, 5 Jun 2026 at 21:02, Jordan Peck<[email protected]> wrote:
Regarding the disk space savings, I haven't tested against pristineless, but 
the savings are also close to 50% on a repo with lots of large binary files:
The large SVN repo we use at work is a 1.6TB fresh checkout on disk, with this 
change that drops to 865GB (on Windows+ReFS). A huge saving due to the large 
number of art assets in our repository!
I ran some tests with and without the patch, the ~10% slowdown is from losing 
the streamy checkout benefits and instead having to do extra system calls.

1.16:      1588.22 GB  169 mins
1.16-CoW:  864.76 GB   202 mins

  volatile svn_atomic_t svn_wc__test_writer_copy_source_count = 0;
I agree this extern debug atomic for at est is awkward, I didn't really see any 
way around it if we want a test that checks the reflink path is actually being 
taken. Adding platform native methods for checking if a file is reflinked for 
testing reasons is not practical, given it's also not a simple to work out via 
the filesystem anyway. However, we could discuss reworking or removing the test 
to avoid this weird variable. I'm not sure about the warning you are seeing, I 
don't think I saw that when I built for windows/linux.

This means that, for example, a Subversion working copy, where most
files havesvn:eol-style=native, would see no improvement, correct?
No they would see no benefit, which is why this saves slightly less space than 
pristineless. But in our large repo 95% of the disk usage comes from large 
binary blob files which can benefit. I realise this isn't the case for 
everyone, but if you have a repo with only code files withsvn:eol-style=native 
it's probably not large enough to be a disk space concern in the first place.

How does this interact with the pristines-on-demand feature? How is this
tested?
There isn't a test for this but the "pristines-on-demand" takes precedence over 
reflinking, they shouldn't interfere with each other.

If clients that do not support CoW use the same working copy as a client
that does, how do they interact?
Yes! That's the best part, there's no working copy format change. The 
filesystem handles the reflinked files transparently once they've been set up. 
If you did a checkout on a CoW supporting drive (ReFS) and got the disk usage 
benefits of CoW you would still be able to copy your svn checkout folder to 
another drive with no CoW support (NTFS), it would transparently expand to its 
full size on the new drive, but otherwise would be fully functional.

Does this happen on every invocation? Concretely: if I'm on Windows with
NTFS, I'll get two file open attempts instead of one, every time?
There are details on this in the op but no, there is a 1 time check at the start of a 
checkout/update operation to see if the current filesystem supports CoW. If not all files 
will use the current "streamy-checkout" path avoiding any extra system calls.

Not that this is indicative of patch quality in general. But I'd have
preferred to see some discussion on dev@ before being presented with
160k of patch. It's basically unreviewable, I don't even know where to
start.
I was originally working on this feature for myself and work colleagues. It was initially 
a hacky, Windows-only implementation. Then I added linux, polished it a bit and ended up 
fleshing it out a lot more. It got to the point were I figured I might as well bring it 
all together into a patch and share it here to get people's thoughts on it. I do realise 
it's a huge patch, and in the "Notes" section of the op I laid out how it could 
be split into several parts. But I didn't want to do the work to split it before gauging 
general interest in the feature.

On Fri, 5 Jun 2026 at 19:50, Sean McBride<[email protected]> wrote:
On 4 Jun 2026, at 7:31, Jordan Peck via dev wrote:

This patch saves disk space on supported filesystems for byte-identical file 
installation by utilising filesystem clone APIs.
FWIW, just the other week, I've used this tool:

https://github.com/ttkb-oss/dedup

on my Mac, and several others in our office, to deduplicate files in our svn 
(1.14) working copies. It detected identical files not only in our /branches vs 
/trunk but also among the pristine copies, and it deduplicated them using APFS' 
cloning feature.

It didn't cause svn to freak out, so that bodes well.

Sean

Reply via email to