On Apr 14, 2017, at 5:09 PM, Ross Berteig <r...@cheshireeng.com> wrote:

> I've checked it in on the glob-docs branch until it has been read by at least 
> one more pair of eyes.

How about two pair?  (Because foureyes.  Ahahah.)

Go put your Nomex underwear on; I’m a brutal copy editor.

Complaints, comments, and considerations, mostly in order of the current 
presentation:

1. It doesn’t tell you that globs and regexes are not the same thing.  I see 
this confusion occasionally, so I think it’s worth a warning.  We are, after 
all, targeting this at least partly at people who don’t already know what 
“glob” means.

(Example: https://unix.stackexchange.com/q/279661)

2. I’d move that first parenthetical to a second sentence.  It’s hard to read 
as-is.  Consider: “Glob patterns are also accepted as options to certain 
commands as well as query parameters to certain pages.”

3. GLOB is all-caps in Fossil help output because it’s a variable parameter, 
but it should only be written that way in documentation when referring to 
syntax examples in Fossil command output or the corresponding docs on 
fossil-scm.org.  GLOB is not an acronym.  The correct term is “a glob pattern,” 
or more idiomatically, “a glob”:

    https://en.wikipedia.org/wiki/Glob_(programming)

Link that Wikipedia article somewhere near the top of your article, too.

4. Para 2, sentence 1: make it two sentences.  The second half doesn’t follow 
from the first.  It’s an independent statement.

5. Para 2, sentence 2: nix the comma; the second part is not a complete 
sentence.

6. Nix para 4: we already know that most documentation exists to avoid the need 
to RTFS.  It needn’t be stated here. :)

7. Move “any” definition to a sentence after the table: “Any other character 
matches that character exactly.” or similar.

8. “…additional features:”  (Colon, not period.)

9. Ranges: How does that work with Unicode?  That is, does [a-d] match ä in any 
collating order supported by Fossil?  Does it depend on whether Fossil is 
linked to libics?  Answers to both should be given here.  Let’s not be 
needlessly Anglocentric.

(Qualifier given because I don’t mean to suggest that you must translate this 
whole document to other human languages.)

10. Matching hyphen: can you also put it first in a bracket expression, after 
an optional ^, as specified by POSIX?

    
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13

If so, document it, and if not, file a ticket, because it’s a bug.

(Another good link to put into the document.)

11. [^]] example: I’d prefer a different character here to avoid confusion.  
[^a] would be fine.

12. “...must match the entire name to be considered a match.”  Emphasize this 
somehow.  It’s a commonly-misunderstood aspect of Fossil’s glob matching rules.

13. "may have one GLOB per line.” -> “may be given as one glob per line.”

14. First para of "File names to match” section needs a rewrite: “The canonical 
name of a file has all directory separators changed to `/`, redundant slashes 
are removed, all `.` path components are removed, and all `..` path components 
are resolved. (There are additional details we won’t go into here.)”

In that rewrite, I changed the treatment of slashes in part because I doubt 
“../foo” is left unchanged.  By your rules, the .. component would have to have 
the leading slash to be affected.  Double-check with the source, but I’m pretty 
sure I’m right.

My rewrite also adds the redundant slash removal bit, as is proper according to 
POSIX, but may not be the way Fossil works; if it doesn’t work that way, that 
needs to be documented instead, since it will confuse those of us who know 
POSIX requires this.  That is, /bin/ls and /bin////ls are the same thing on a 
POSIX box.  Whether Fossil follows suit or not, it needs to be documented.

(There’s an obscure POSIX rule that says two leading slashes must be left 
untouched, but I wouldn’t expect Fossil to obey this, and I certainly wouldn’t 
expect most readers to know about the rule and therefore expect the exception.)

15. There is no item #15.

16. "This has some consequences.”  That seems to want to introduce a list, not 
stand alone as a complete paragraph.  Make the following paragraph a bulleted 
list, and change . to :.

17. “Recall that…”  I don’t think it’s clear from the earlier paragraph that \ 
becomes / even on Windows.  It could be read as simply a bit of Unix-centrism, 
with some Windows-using readers disregarding it, thinking, “Yeah, yeah, I know 
what you really mean here.”  That second-guesser would be wrong in this case, 
so say instead, “Fossil glob patterns always use forward slashes as path 
separators, even on Windows.”

18. The “Where are they used” and “Platform quirks” headers should be bigger.  
I think you have your # character counts mixed up in the Markdown source.  (Or 
---- where you mean ====, if you do your headings that way; I didn’t bother 
looking.)

19. /timeline -> `/timeline`

20. “It also can use” -> "It can also use” 

21. “GLOB, LIKE, or REGEXP”  I don’t think you want to talk about the SQLite 
operator/function GLOB here, as it’s confusing with respect to the simple 
treatment of globs in the rest of this document.  Unless you’re going to give 
examples of GLOB-the-SQLite-function here, drop it from this list.  (I think 
you can safely ignore that detail.)

22. Either give brief LIKE and REGEXP examples here in this document, add new 
documents for each and link to them, or drop mention of these details.  As it 
stands, I’m left wondering why this doesn’t work:

    https://www.fossil-scm.org/index.html/timeline?chng=%25MakeLists.txt&ms=LIKE

All three of my options prevent that confusion.

23. Does EXACT also work?  This suggests that it should:

    https://sqlite.org/lang_expr.html

(Another link opportunity.  You can tell that I pepper my Markdown docs with 
links, can’t you?)

If not, explain why not, lest someone else make the same leap.  (And maybe file 
a feature request.  It could be useful to bypass certain circles in the Quoting 
Inferno.)

24. "These settings are all lists of GLOBs.”  Split the para here with the list 
between the two parts; end the first para with a colon.

25. “…or file in the repository’s…” -> “…or put a file in the repository’s”

26. If you’re going to cover `.fossil-settings` here at all, make it clear that 
this must be at the top level of the checkout directory.  Adding 
`some/path/.fossil-settings/ignore-glob` to the repository won’t let you avoid 
prefixing globs with “some/path/“.

This is a perfectly reasonable thing to try, particularly for Subversion and 
Git transplants, which allow .svnignore and .gitignore files anywhere in the 
tree, with matches based at the file’s location.  (This Subversion transplant 
did it early in his Fossil career, and was annoyed when it didn’t work.)

27. Add a section or new document on transitioning from .fooignore files.

28. The “Commands that refer to globs” section should make clear that it is 
talking about things like the --clean and --ignore options to `fossil add`, and 
that it is not talking about the file lists some of these commands take.  It 
should refer the reader to my next item’s rewrite, which covers that.

29. The Platform Quirks section needs a total rewrite.  Sorry, but it left me 
confused, and I know what’s going on.

How about this:

————————————————————

# Platform Quirks

Fossil glob patterns are based on the glob pattern feature of POSIX shells. 
Fossil glob patterns also have a quoting mechanism, discussed above. Because 
other parts of your operating system may interpret glob patterns and quotes 
separately from Fossil, it is often difficult to give glob patterns correctly 
to Fossil on the command line. Quotes and special characters in glob patterns 
are likely to interpreted when given as part of a `fossil` command, causing 
unexpected behavior.

These problems do not affect [versioned settings 
files](/doc/trunk/www/settings.wiki) or Admin &rarr; Settings in Fossil UI. 
Consequently, it is better to set long-term `*-glob` settings via these methods 
than to use `fossil settings` commands.

That advice doesn’t help you when you are giving one-off glob patterns in 
`fossil` commands. The remainder of this section gives remedies and workarounds 
for these problems.


## POSIX Systems

If you are using Fossil on a system with a POSIX-compatible shell &mdash; 
Linux, macOS, the BSDs, Unix, Cygwin, WSL etc. &mdash; the shell may expand the 
glob patterns before passing the result to the `fossil` executable.

Sometimes this is exactly what you want.  Consider this command for example:

    $ fossil add RE*

If you give that command in a directory containing `README.txt` and 
`RELEASE-NOTES.txt`, the shell will expand the command to:

    $ fossil add README.txt RELEASE-NOTES.txt

…which is compatible with the `fossil add` command’s argument list, which 
allows multiple files. Fossil doesn’t see the glob pattern at all, but since 
the command does what you almost certainly wanted anyway, it’s fine.

Now consider what happens instead if you say:

    $ fossil add --ignore RE* src/*.c

This *doesn’t* do what you want because the shell will expand both `RE*` and 
`src/*.c`, causing one of the two files matching the `RE*` glob pattern to be 
ignored and the other to be added to the repository. You need to say this in 
that case:

    $ fossil add --ignore 'RE*' src/*.c

The single quotes force a POSIX shell to pass the `RE*` glob pattern through to 
Fossil untouched, which will do its own glob pattern matching. There are other 
methods of quoting a glob pattern or escaping its special characters; see your 
shell’s manual.

POSIX shells also interpret the same quotation marks Fossil uses to handle 
things like spaces in file names, as discussed above. For example, if you 
needed to add all files matching `RE*` to the repository except for a file 
called `REALLY SECRET STUFF.txt`, you could use nested quotes:

    $ fossil add --ignore "'REALLY SECRET STUFF.txt'" RE*

You could instead escape a second set of double quotation marks:

    $ fossil add --ignore "\"REALLY SECRET STUFF.txt\"" RE*

It bears repeating that the two glob patterns here are not interpreted the same 
way when running this command from a *subdirectory* of the top checkout 
directory as when running it at the top of the checkout tree. If these files 
were in a subdirectory of the checkout tree called `doc` and that was your 
current working directory, the command would have to be:

    $ fossil add --ignore "'doc/REALLY SECRET STUFF.txt'" RE*

instead. The Fossil glob pattern still needs the `doc/` prefix because Fossil 
always interprets glob patterns from the base of the checkout directory, not 
from the current working directory as POSIX shells do.


## Windows

Neither standard Windows command shell &mdash; `cmd.exe` or PowerShell &mdash; 
expands glob patterns the way POSIX shells do. Windows command shells rely on 
the command itself to do the glob pattern expansion. The way this works depends 
on several factors:

*   the version of Windows you’re using
*   which OS upgrades have been applied to it
*   the compiler that built your Fossil executable
*   whether you’re running the command interactively
*   whether the command is built against a runtime system that does this at all
*   whether the Fossil command is being run from a file named `*.BAT` vs being 
named `*.CMD`
*   the phase of the moon and whether this is an odd-numbered Thursday.  (No, 
not really, but the other caveats are all true.  Yay, Windows!)

These factors also affect how a program like `fossil.exe` interprets quotation 
marks on its command line.

The fifth item above doesn’t apply to `fossil.exe` when built with typical tool 
chains, but we’ll see an example below where the exception applies in a way 
that affects how Fossil interprets the glob pattern.

The most common problem is figuring out how to get a glob pattern passed on the 
command line into `fossil.exe` without it being expanded by the C runtime 
library that your particular Fossil executable is linked to, which tries to act 
like the POSIX systems described above. Windows is not strongly governed by 
POSIX, so it has not historically hewed closely to its strictures.

(This section does not cover the [Microsoft POSIX 
subsystem](https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem), Windows’ 
obsolete [Services for Unix 
3.*x*](https://en.wikipedia.org/wiki/Windows_Services_for_UNIX) feature, or the 
[Windows Subsystem for 
Linux](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). (The latter 
is sometimes incorrectly called “Bash on Windows” or “Ubuntu on Windows.”) See 
the POSIX Systems section above for those cases.)

For example, consider how you would set `crlf-glob` to `*`. The na&iuml;ve 
approach will not work:

    c:\...> fossil setting crlf-glob *

The C runtime library will expand that to the list of all files in the current 
directory, which will probably cause a Fossil error because Fossil expects 
either “`global`” or nothing after command line parameter giving the setting’s 
new value. If you happened to run this in a directory with two files, one of 
which was called `global`, it might appear to work but do the wrong thing, 
depending on whether the `global` file name was expanded first or second.

Let’s try again:

    c:\...> fossil setting crlf-glob '*'

That may or may not work, depending on the factors listed above. On one system 
where this was tested, it failed because the command shell sees that no file in 
the current directory matches the glob pattern `'*'`, so the command shell 
passed those three characters unchanged to `fossil.exe`, which stored them 
as-is. Then when Fossil went to apply that glob pattern to file names, it saw 
that the glob pattern is quoted, so it didn’t interpret `*` as meaning “any 
series of characters;” the quotes made Fossil skip the “looks like a text file” 
rules only for a file called exactly `'*'` rather than what we wanted, which 
was to skip those rule checks for all files at the top of the checkout 
directory.

An approach that *will* work reliably is:

    c:\...> echo * | fossil setting crlf-glob --args -

This works because the built-in command `echo` does not expand its arguments, 
and the global Fossil option `--args` makes it read further command arguments 
from `-`, meaning Fossil’s standard input, which is connected to the output of 
`echo` by the pipe.

Another correct approach is:

    c:\...> fossil setting crlf-glob *,

This works because the trailing comma prevents the command shell from matching 
any files, unless you happen to have files named with a trailing comma in the 
current directory. If the pattern matches no files, it is passed into Fossil’s 
`main()` function as-is by the C runtime system. Since Fossil uses commas to 
separate multiple glob patterns, this means “all files at the root of the 
Fossil checkout directory and nothing else.”


————————————————————

Feel free to eliminate my snark in the final bullet item. :)

Also double-check my Windows section rewrites. I tried some of it here, but my 
Windows-fu is weaker than my POSIX-fu.

A signed contributor agreement form is in the mail.

_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev

Reply via email to