On Mon, Dec 30, 2024 at 08:06:37PM +0100, Daniel Sahlberg wrote:
> Den lör 28 dec. 2024 kl 21:35 skrev Daniel Sahlberg <
> daniel.l.sahlb...@gmail.com>:
> 
> > I also made a "review" in Github but it seems that wasn't forwarded to the
> > list so I'll copy-paste it below:
> >
> > Thank you for the patch!
> >
> > I understand the idea of the patch but I'm not sure I think this
> > implementation is a good idea.
> >
> > As noted in the code (lines 169-171) the format is the same as svn status.
> > Changing the output format will break any scripts relying on parsing the
> > text output. Of course, that could be acceptable, but the bar should be
> > quite high in my opinion.
> >
> > The extra column will be output both in svn diff [working copy path] and
> > svn diff [url]. When operating on a wc path, the extra data will be the
> > same (plus leading and trailing single quote). When operating on a URL, the
> > extra data will be the path component of the url, without url encoding.
> >
> > The new format would be very hard to parse:
> >
> > Spllitting on space will not work if the path contains a space (it will be
> > output verbatim at LEAST in the second component).
> > Splitting on single quote will not work if the path contains a single
> > qoute.
> >
> > All in all, I'm -0 to accept this patch: I don't agree but I wouldn't
> > stand in the way of consensus.
> >
> >
> >
> > Cheers,
> >
> > Daniel
> >
> >
> The user came back om Github with the following message:
> [[[
> Hey @dsahlberg-apache-org <https://github.com/dsahlberg-apache-org> ! Thank
> you so much for reviewing this PR! What you said is exactly what I was
> concerned about. Such a change would indeed render scripts that rely on
> command outputs unusable. To be honest, I'm not a seasoned SVN user, and
> this issue came up during conversations with my friends about their
> operations challenges. So, I submitted this commit with a "let's give it a
> try" mindset.
> I’d like to hear your thoughts on how we might improve this to partially
> retain the output format of the old version (i.e., properly displaying
> filenames that need escaping) while also supporting the current URI-style
> output. How could we achieve a balance between the two? I understand that
> removing the --summarize option allows filenames to display correctly, but
> it would be more helpful if filenames could be included in a single line.
> Thank you very much!
> ]]]
> 
> To make it easier to grasp, I'm attaching a few samples
> 
> I have a repo test, with a subdirectory dir1 and a file named path 'path'
> (the word path twice, the second time in single quotes).
> 
> This is the current output
> [[[
> $ svn diff -c68 --summarize https://svn.example.com/svn/test
> A       https://svn.example.com/svn/test/dir1/path%20'path'
> ]]]
> 
> The PR would output
> [[[
> $ svn diff -c68 --summarize https://svn.example.com/svn/test
> A       https://svn.example.com/svn/test/dir1/path%20'path' 'path 'path''
> ]]]
> I don't like this at all, the only way (that I can think of) to figure out
> where the "filename" part starts would be to count the number of single
> quotes and split before the ((n-2)/2+1):nd.
> 
> Using svn_path_join_internal instead of svn_path_url_add_component would
> output:
> [[[
> $ svn diff -c68 --summarize https://svn.example.com/svn/test
> A       https://svn.example.com/svn/test/dir1/path 'path'
> ]]]
> I think this last example would be acceptable. Most scripts would PROBABLY
> already cut from the 9:th character to the end of the line and that would
> still work. Using the URL as an argument to, for example, wget would
> require it to be quoted, but so does the existing output (at least to wget).
> 
> I don't want to make this change myself, but rather looking for some +1's
> (or -1's) from other committers.

This change seems to be unnecessary to me. The stated motivation for it,
i.e. translating URLs to local paths in order to decide which local files
were modified, ignores risks which occur beyond the escaping of characters.

If a mapping from repo URL to a local path is needed for scripting purposes
without any further processing, then the file or directory can be checked
out before diff --summarize is run. When provided a working copy path then
diff --summarize displays on-disk paths in local style, spaces and all.
So why not just use a working copy if local paths are required?

When provided a URL, then diff --summarize displays a URL.
There should already be tools which decode URI-encoded data for use in
a script if that is needed. However, there are problems beyond characters
which require ecaping in a URL. Scripts should take precautions when
translating URLs to a local path in an automated way. The repository root
needs to be infered somehow, e.g. the repository-relative path for the URL
https://svn.apache.org/repos/asf/subversion/trunk/libsvn_client
is subversion/trunk/libsvn_client, and scripts would need to know this.

Worse, paths need to be normalized before use because a malicious server
or proxy could return a URL containing paths components such as .. and
then overwrite files outside the intended working area.

The SVN client covers such problems during checkout. Checking out a working
copy before operating on local paths is much safer than working from URLs.

Reply via email to