pegrev_parse_tests.py

Branko Čibej Sat, 10 Nov 2018 22:48:01 -0800

On 11.11.2018 07:29, Daniel Shahaf wrote:
> Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:
>> On 10.11.2018 01:31, Daniel Shahaf wrote:
>>> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
>>>> On 09.11.2018 12:54, [email protected] wrote:
>>>>> Author: brane
>>>>> Date: Fri Nov  9 11:54:21 2018
>>>>> New Revision: 1846237
>>>>>
>>>>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
>>>>> Log:
>>>>> More tests for peg revision parsing.
>>>> [...]
>>>>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
>>>>> +def remove_subdir_7a_no_escape_peg(sbox):
>>>>> +  "remove missing 'E/@' without pegrev escape"
>>>>> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
>>>> Yes indeed ... our target parsing is so incredibly reliable that
>>>> attempting to delete a non-existent 'E/@' will happily delete 'E' with
>>>> all its contents. I think this may be too much of a good thing.
>>> The documented escaping rule for CLI consumers is to append an "@" after
>>> every filename.  By this rule, a CLI consumer that _wants_ to remove E/
>>> would run "svn rm E/@".  That command should parse as { target = 'E/',
>>> peg = '' } regardless of whether the directory "E" contains a child
>>> named "@".
>>>
>>> I don't understand what change you're proposing.
>> Consider this:
>>
>> $ svn rm E@
>> D         E
>> $ svn rm @
>> svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
>> $ svn rm E/@
>> D         E
>>
>>
>> would it not be more consistent that trying to remove 'E/@' would raise
>> the same error as trying to remove '@'?
> The argument parsing is documented to work as follows: remove the last
> "@" and everything after it — that's the peg rev.  What remains is the
> target dirent or URL.  Thus, ultimately the reason for the different
> errors is that "E/" is a valid OS-level path specification but "" is not
> (in the sense that «stat("E/")» works and «stat("")» errors).
>
> In other words, your examples are simply usage errors, but it so happens
> that one of them is a syntax error and one of them is not.  (It would
> have been good if both usage errors had been syntax errors, but we're
> 12 years too late to get that right to begin with.  C'est la vie.)


Life can also be fixed to suck less.

> The correct syntaxes for removing "./@" and "E/@", of course, are
> «svn rm E/@@» and «svn rm @@» respectively.  
>
> I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:

I didn't say it should. I said it should error out just like 'svn rm @'
does.


> 1. Because that would be backwards incompatible.

Obviouisly.

> 2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
> is parsed in exactly the same way regardless of the values of
> ${subcommand} and ${target} and regardless of the tree contents.

I do think that this rule was not sufficiently thought out.

> IIRC «svn up @» used to be the same as «svn up ./», so I think we
> shouldn't make it mean "update the file './@'" because that would be
> a silent incompatibility for some users.

Again, I didn't say it should. I said it should be an error because
'./@' is ambiguous.

>   I don't remember what the last
> minor line with the non-error meaning was, though.
>
>> Because otherwise the behaviour of the client depends on whether you're
>> removing things from the current directory or not.
> To be more precise, the question is whether one spells the command
> «svn rm @» or «svn rm ./@».

or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature
canonicalization is a bad thing.


>> This becomes especially error-prone in the case when
>> E/@ exists:
>>
>> $ svn rm E/@
>> D         E
>> D         E/@
>>
>>
>> I'm pretty sure we should not remove a parent directory if the child
>> might have been the intended target.
>>
> I see that there's an argument that we should require some option here,
> à la --force-log, but frankly, I don't think files literally called "@"
> or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
> about.

Perhaps not, but then there's SVN-4530, so I'd say they're common enough.


> Whatever we do, we must have a documented syntax that means "recursively
> delete E/ and everything thereunder" and works regardless of what
> filenames E/ contains.

Yes, it's called 'svn rm E'.


>> These usages were not ambiguous and possibly error-prone until we
>> invented the peg-revision syntax. At the time we failed to strictly
>> define the semantics , so maybe it's time we did that now. For example,
>> given the path
>>
>>     foo@bar/baz
>>
>> the parser will treat 'bar/baz' as the peg-revision specifier, even
>> though it's obvious that's not what the user had in mind.
>> If we changed the parser to only look for peg revisions on the basename of
>> the path, then at least the only restriction we'll have is that directory
>> separators can't be part of the peg revision specifier — which should be an
>> acceptable restriction, since currently we only support known keywords,
>> revision numbers or dates, none of which contain forward- or backslashes.
>>
>> Obviously that would only be a parser change, peg revisions would still
>> apply to whole paths for command semantics.
>>
> So you're proposing changing the parsing rule from "the last @" to
> "the last @ that isn't followed by any os.path.sep"?  That's forward
> compatible if we constrain ourselves never to use slash or backslash in
> a pegrev specifier, and backwards compatible in that every non-erroneous
> usage will continue to work, so it's something we can consider.
>
> How would that affect scheme://user@hostname/ URLs?

I haven't tested that yet.

> What about typos such as «foo@HEA/D»?

This won't work currently, but you're right, it _could_ work with the
proposed change. So my assumption that we'd only add more errors is wrong.

>> I'm aware that doing this would change the meaning of some forms of
>> commands, but for now I'm fairly sure we'd only produce new errors, not
>> silently behave differently. From the current set of tests, it seems to
>> follow that this inconsistency only arises with targets whose basename
>> starts with '@' or '.@' (and probably '..@'), the latter because '.'
>> (and '..') are removed by canonicalization.
> I don't follow what you mean by "this"; do you refer to forbidding
> slashes in pegrevs or to something else?  (The former wouldn't seem to
> address all the concerns you described.)
>
> Cheers,
>
> Daniel
>
> P.S.  Other languages also have examples of usage errors that aren't
> syntax errors.  For example, in C the tokens «+», «-», and «*» can be
> either unary or binary operators, so «double hypotenuse(double a, double b)
> { return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
> is not.

I really don't see how this comparison is valid. C operator syntax and
precedence is precisely defined. Our usage of @ in paths may be
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

-- Brane

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Reply via email to