On 11.11.2018 07:29, Daniel Shahaf wrote: > Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100: >> On 10.11.2018 01:31, Daniel Shahaf wrote: >>> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100: >>>> On 09.11.2018 12:54, br...@apache.org wrote: >>>>> Author: brane >>>>> Date: Fri Nov 9 11:54:21 2018 >>>>> New Revision: 1846237 >>>>> >>>>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev >>>>> Log: >>>>> More tests for peg revision parsing. >>>> [...] >>>>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@") >>>>> +def remove_subdir_7a_no_escape_peg(sbox): >>>>> + "remove missing 'E/@' without pegrev escape" >>>>> + do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'") >>>> Yes indeed ... our target parsing is so incredibly reliable that >>>> attempting to delete a non-existent 'E/@' will happily delete 'E' with >>>> all its contents. I think this may be too much of a good thing. >>> The documented escaping rule for CLI consumers is to append an "@" after >>> every filename. By this rule, a CLI consumer that _wants_ to remove E/ >>> would run "svn rm E/@". That command should parse as { target = 'E/', >>> peg = '' } regardless of whether the directory "E" contains a child >>> named "@". >>> >>> I don't understand what change you're proposing. >> Consider this: >> >> $ svn rm E@ >> D E >> $ svn rm @ >> svn: E125001: '@' is just a peg revision. Maybe try '@@' instead? >> $ svn rm E/@ >> D E >> >> >> would it not be more consistent that trying to remove 'E/@' would raise >> the same error as trying to remove '@'? > The argument parsing is documented to work as follows: remove the last > "@" and everything after it — that's the peg rev. What remains is the > target dirent or URL. Thus, ultimately the reason for the different > errors is that "E/" is a valid OS-level path specification but "" is not > (in the sense that «stat("E/")» works and «stat("")» errors). > > In other words, your examples are simply usage errors, but it so happens > that one of them is a syntax error and one of them is not. (It would > have been good if both usage errors had been syntax errors, but we're > 12 years too late to get that right to begin with. C'est la vie.)
Life can also be fixed to suck less. > The correct syntaxes for removing "./@" and "E/@", of course, are > «svn rm E/@@» and «svn rm @@» respectively. > > I think that «svn rm E/@» should _not_ remove "E/@", for two reasons: I didn't say it should. I said it should error out just like 'svn rm @' does. > 1. Because that would be backwards incompatible. Obviouisly. > 2. In order to preserve the property that «svn ${subcommand} -- ${target}@» > is parsed in exactly the same way regardless of the values of > ${subcommand} and ${target} and regardless of the tree contents. I do think that this rule was not sufficiently thought out. > IIRC «svn up @» used to be the same as «svn up ./», so I think we > shouldn't make it mean "update the file './@'" because that would be > a silent incompatibility for some users. Again, I didn't say it should. I said it should be an error because './@' is ambiguous. > I don't remember what the last > minor line with the non-error meaning was, though. > >> Because otherwise the behaviour of the client depends on whether you're >> removing things from the current directory or not. > To be more precise, the question is whether one spells the command > «svn rm @» or «svn rm ./@». or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature canonicalization is a bad thing. >> This becomes especially error-prone in the case when >> E/@ exists: >> >> $ svn rm E/@ >> D E >> D E/@ >> >> >> I'm pretty sure we should not remove a parent directory if the child >> might have been the intended target. >> > I see that there's an argument that we should require some option here, > à la --force-log, but frankly, I don't think files literally called "@" > or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry > about. Perhaps not, but then there's SVN-4530, so I'd say they're common enough. > Whatever we do, we must have a documented syntax that means "recursively > delete E/ and everything thereunder" and works regardless of what > filenames E/ contains. Yes, it's called 'svn rm E'. >> These usages were not ambiguous and possibly error-prone until we >> invented the peg-revision syntax. At the time we failed to strictly >> define the semantics , so maybe it's time we did that now. For example, >> given the path >> >> foo@bar/baz >> >> the parser will treat 'bar/baz' as the peg-revision specifier, even >> though it's obvious that's not what the user had in mind. >> If we changed the parser to only look for peg revisions on the basename of >> the path, then at least the only restriction we'll have is that directory >> separators can't be part of the peg revision specifier — which should be an >> acceptable restriction, since currently we only support known keywords, >> revision numbers or dates, none of which contain forward- or backslashes. >> >> Obviously that would only be a parser change, peg revisions would still >> apply to whole paths for command semantics. >> > So you're proposing changing the parsing rule from "the last @" to > "the last @ that isn't followed by any os.path.sep"? That's forward > compatible if we constrain ourselves never to use slash or backslash in > a pegrev specifier, and backwards compatible in that every non-erroneous > usage will continue to work, so it's something we can consider. > > How would that affect scheme://user@hostname/ URLs? I haven't tested that yet. > What about typos such as «foo@HEA/D»? This won't work currently, but you're right, it _could_ work with the proposed change. So my assumption that we'd only add more errors is wrong. >> I'm aware that doing this would change the meaning of some forms of >> commands, but for now I'm fairly sure we'd only produce new errors, not >> silently behave differently. From the current set of tests, it seems to >> follow that this inconsistency only arises with targets whose basename >> starts with '@' or '.@' (and probably '..@'), the latter because '.' >> (and '..') are removed by canonicalization. > I don't follow what you mean by "this"; do you refer to forbidding > slashes in pegrevs or to something else? (The former wouldn't seem to > address all the concerns you described.) > > Cheers, > > Daniel > > P.S. Other languages also have examples of usage errors that aren't > syntax errors. For example, in C the tokens «+», «-», and «*» can be > either unary or binary operators, so «double hypotenuse(double a, double b) > { return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)» > is not. I really don't see how this comparison is valid. C operator syntax and precedence is precisely defined. Our usage of @ in paths may be well-defined too, but its interaction with aggressive canonicalization creates unintentional side effects in the implementation. -- Brane