On 4/14/2017 9:15 PM, Warren Young wrote:
On Apr 14, 2017, at 5:09 PM, Ross Berteig <r...@cheshireeng.com> wrote:
I've checked it in on the glob-docs branch until it has been read by at least
one more pair of eyes.
How about two pair? (Because foureyes. Ahahah.)
Go put your Nomex underwear on; I’m a brutal copy editor.
Exactly why I didn't just dump it on trunk, I knew it needed a copy
editor to read it.
Complaints, comments, and considerations, mostly in order of the current
presentation:
1. It doesn’t tell you that globs and regexes are not the same thing. I see
this confusion occasionally, so I think it’s worth a warning. We are, after
all, targeting this at least partly at people who don’t already know what
“glob” means.
(Example: https://unix.stackexchange.com/q/279661)
True, it probably should.I'll work that into the second paragraph.
2. I’d move that first parenthetical to a second sentence. It’s hard to read
as-is. Consider: “Glob patterns are also accepted as options to certain
commands as well as query parameters to certain pages.”
3. GLOB is all-caps in Fossil help output because it’s a variable parameter,
but it should only be written that way in documentation when referring to
syntax examples in Fossil command output or the corresponding docs on
fossil-scm.org. GLOB is not an acronym. The correct term is “a glob pattern,”
or more idiomatically, “a glob”:
https://en.wikipedia.org/wiki/Glob_(programming)
Link that Wikipedia article somewhere near the top of your article, too.
It is also the SQL operator that implements the pattern. But point
taken. I'll preserve the all caps usage if I quote the command help, and
otherwise use the word "glob" as short for "glob pattern" most places.
4. Para 2, sentence 1: make it two sentences. The second half doesn’t follow
from the first. It’s an independent statement.
5. Para 2, sentence 2: nix the comma; the second part is not a complete
sentence.
6. Nix para 4: we already know that most documentation exists to avoid
the need to RTFS. It needn’t be stated here. :)
That whole rest of the section is really just my whining about the state
of the current docs. I rewrote it and shuffled the presentation order a
little.
7. Move “any” definition to a sentence after the table: “Any other character
matches that character exactly.” or similar.
8. “…additional features:” (Colon, not period.)
9. Ranges: How does that work with Unicode? That is, does [a-d] match ä in any
collating order supported by Fossil? Does it depend on whether Fossil is
linked to libics? Answers to both should be given here. Let’s not be
needlessly Anglocentric.
(Qualifier given because I don’t mean to suggest that you must translate this
whole document to other human languages.)
That is a damn good question.
I had to chase into the SQLite source code again to find out, and at
that I remain unsure how locale and platform quirks relate. The SQLite
source is pretty clear about comparing Unicode code points as they
appear in the pattern and subject strings without any normalization or
locale-specific collation rules.
There is no dependency on libics, and since the Image Cytometry Standard
has little impact on us, I'm not that surprised. ;-) ICU on the other
hand might be more interesting. But I don't see any dependency on ICU4C,
or any other Unicode support library.
I don't think this is Windows specific (or worse, Windows version
specific, code page dependent, or something even more strange). Windows
has a weird love-hate relationship with Unicode. They were an early
adopter, but made some regrettable choices early on that are still
haunting us.
From what I can see in the source code, ranges are comparisons to raw
code points, and so [a-d] will not match ä, but [à-ë] will match ä.
At a normal Windows command prompt with the default code page in effect,
test-glob confirms my guesses. Note that the range can be unexpectedly wide:
C:...>fossil test-glob [a-ë] a A ä Ä ~
SQL expression: (x GLOB '[a-ë]')
pattern[0] = [[a-ë]]
1 a
0 A
1 ä
1 Ä
1 ~
10. Matching hyphen: can you also put it first in a bracket expression, after
an optional ^, as specified by POSIX?
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
If so, document it, and if not, file a ticket, because it’s a bug.
(Another good link to put into the document.)
That one works as specified (extra ^ needed because Windows):
C:...>fossil test-glob [^^-] - a ^^
SQL expression: (x GLOB '[^-]')
pattern[0] = [[^-]]
0 -
1 a
1 ^
I wonder what other odd cases are in the SQLite test suite for the GLOB
operator...
I also wonder if the SQLite intent was to follow POSIX or just a general
sense of "Unix is like this". For instance, 'a/b/c' GLOB '*' is TRUE in
SQLite, but at a shell prompt the `*` would only match the `a` part.
11. [^]] example: I’d prefer a different character here to avoid confusion.
[^a] would be fine.
I think the point of that and the following example was to be
confusing... those are more "advanced" examples. I added a couple of
simpler examples of exclusion.
12. “...must match the entire name to be considered a match.” Emphasize this
somehow. It’s a commonly-misunderstood aspect of Fossil’s glob matching rules.
Rewriting that aside to be more prominent. I suspect it really deserves
to be said more than once, and more than one way. So I've emphasized in
the syntax that a glob matches only if it consumes the entire target
text, and I'll say that again in this context with emphasis on the file
name as target.
13. "may have one GLOB per line.” -> “may be given as one glob per line.”
14. First para of "File names to match” section needs a rewrite: “The canonical
name of a file has all directory separators changed to `/`, redundant slashes are
removed, all `.` path components are removed, and all `..` path components are
resolved. (There are additional details we won’t go into here.)”
In that rewrite, I changed the treatment of slashes in part because I doubt
“../foo” is left unchanged. By your rules, the .. component would have to have
the leading slash to be affected. Double-check with the source, but I’m pretty
sure I’m right.
My rewrite also adds the redundant slash removal bit, as is proper according to
POSIX, but may not be the way Fossil works; if it doesn’t work that way, that
needs to be documented instead, since it will confuse those of us who know
POSIX requires this. That is, /bin/ls and /bin////ls are the same thing on a
POSIX box. Whether Fossil follows suit or not, it needs to be documented.
(There’s an obscure POSIX rule that says two leading slashes must be left
untouched, but I wouldn’t expect Fossil to obey this, and I certainly wouldn’t
expect most readers to know about the rule and therefore expect the exception.)
Windows users expect \\SERVER\Share\some\file.txt to make sense. It
won't always make sense to fossil, but the test-simplify-name command
shows that fossil does indeed handle it:
C:...>fossil test-simplify-name \\SERVER\Share\some\file.txt
[\\SERVER\Share\some\file.txt] -> [//SERVER/Share/some/file.txt]
15. There is no item #15.
16. "This has some consequences.” That seems to want to introduce a list, not
stand alone as a complete paragraph. Make the following paragraph a bulleted list,
and change . to :.
17. “Recall that…” I don’t think it’s clear from the earlier paragraph that \
becomes / even on Windows. It could be read as simply a bit of Unix-centrism,
with some Windows-using readers disregarding it, thinking, “Yeah, yeah, I know
what you really mean here.” That second-guesser would be wrong in this case,
so say instead, “Fossil glob patterns always use forward slashes as path
separators, even on Windows.”
I rewrote the whole section, with your comments in mind. I used your
wording of the simplification, that seems clearer than what I wrote and
closer to what I intended. The gory details are a very deep well.
18. The “Where are they used” and “Platform quirks” headers should be bigger.
I think you have your # character counts mixed up in the Markdown source. (Or
---- where you mean ====, if you do your headings that way; I didn’t bother
looking.)
They are the same header level as Syntax and File Names to Match.
19. /timeline -> `/timeline`
20. “It also can use” -> "It can also use”
21. “GLOB, LIKE, or REGEXP” I don’t think you want to talk about the SQLite
operator/function GLOB here, as it’s confusing with respect to the simple
treatment of globs in the rest of this document. Unless you’re going to give
examples of GLOB-the-SQLite-function here, drop it from this list. (I think
you can safely ignore that detail.)
22. Either give brief LIKE and REGEXP examples here in this document, add new
documents for each and link to them, or drop mention of these details. As it
stands, I’m left wondering why this doesn’t work:
https://www.fossil-scm.org/index.html/timeline?chng=%25MakeLists.txt&ms=LIKE
In that specific case because the ms=LIKE would only affect a tag name,
not any other name matching. Specifically a tag name supplied to the t=
parameter (and r= because that is a shortcut for rel&t=). Anecdotally,
a=, b=, and c= do not care about ms=.
https://www.fossil-scm.org/index.html/timeline?t=%25ross%25&ms=LIKE
https://www.fossil-scm.org/index.html/timeline?r=version-2.%3F&ms=GLOB
I suspect the right answer here is to drop all mention of any matching other
than ms=GLOB, leaving the meanings of EXACT, LIKE, GLOB, and REGEXP in that
context to some other documentation entirely. I'm not sure I've ever seen the
ms= parameter used in the wild, making that easier for me to let go of today.
This entire document is actually about GLOB the SQLite operator. Well, and the
canonical file names compared by it. So the bit that could be said better is
just that it is a tag name that is being compared, not a file name. And that is
a deep well too, because of the raw tag name vs. the symbolic tag name
implementation detail.
All three of my options prevent that confusion.
23. Does EXACT also work? This suggests that it should:
https://sqlite.org/lang_expr.html
(Another link opportunity. You can tell that I pepper my Markdown docs with
links, can’t you?)
If not, explain why not, lest someone else make the same leap. (And maybe file
a feature request. It could be useful to bypass certain circles in the Quoting
Inferno.)
I think EXACT is the default value of the ms= parameter. It is listed in
the /timeline help text. Try fossil help /timeline.
24. "These settings are all lists of GLOBs.” Split the para here with the list
between the two parts; end the first para with a colon.
25. “…or file in the repository’s…” -> “…or put a file in the repository’s”
26. If you’re going to cover `.fossil-settings` here at all, make it clear that
this must be at the top level of the checkout directory. Adding
`some/path/.fossil-settings/ignore-glob` to the repository won’t let you avoid
prefixing globs with “some/path/“.
This is a perfectly reasonable thing to try, particularly for Subversion and
Git transplants, which allow .svnignore and .gitignore files anywhere in the
tree, with matches based at the file’s location. (This Subversion transplant
did it early in his Fossil career, and was annoyed when it didn’t work.)
I understand why Richard has been resistant to special files scattered
hither and yon in the tree. But it is a difference that often catches
new users.
On the other hand, It is a lot more powerful to just say *.bak in one
file instead of needing it in every folder.
27. Add a section or new document on transitioning from .fooignore files.
Good point. At minimum a sentence right here would anchor it even if it
deserves a full section treating it. Especially if I can refer to the
Gitignore project since those folks have done such a good job of
collecting lists of all the bloody IDE detritus.
My sense at the moment is that a separate document that takes either
Git, SVN, or CVS as the example would be the right answer for this since
it is so specific to a single setting.
28. The “Commands that refer to globs” section should make clear that it is
talking about things like the --clean and --ignore options to `fossil add`, and
that it is not talking about the file lists some of these commands take. It
should refer the reader to my next item’s rewrite, which covers that.
I decided that was a critical detail, and put it at the top of the
document in the first section. I also added some language here making it
clear that the reference is to globs as command line option values and
not the rest of the command line.
Which of course raises the how the hell to I quote this glob to protect
it from my operating system boogie man.
Hence the platform quirks section came to be.
29. The Platform Quirks section needs a total rewrite. Sorry, but it left me
confused, and I know what’s going on.
Not a surprise. I was interrupted at least once while dumping that onto
the editor...
First glance at your rewrite looks good. I'll look closer "soon".
How about this:
————————————————————
# Platform Quirks
Fossil glob patterns are based on the glob pattern feature of POSIX shells.
Fossil glob patterns also have a quoting mechanism, discussed above. Because
other parts of your operating system may interpret glob patterns and quotes
separately from Fossil, it is often difficult to give glob patterns correctly
to Fossil on the command line. Quotes and special characters in glob patterns
are likely to interpreted when given as part of a `fossil` command, causing
unexpected behavior.
These problems do not affect [versioned settings
files](/doc/trunk/www/settings.wiki) or Admin → Settings in Fossil UI.
Consequently, it is better to set long-term `*-glob` settings via these methods
than to use `fossil settings` commands.
That advice doesn’t help you when you are giving one-off glob patterns in
`fossil` commands. The remainder of this section gives remedies and workarounds
for these problems.
## POSIX Systems
If you are using Fossil on a system with a POSIX-compatible shell — Linux,
macOS, the BSDs, Unix, Cygwin, WSL etc. — the shell may expand the glob
patterns before passing the result to the `fossil` executable.
Sometimes this is exactly what you want. Consider this command for example:
$ fossil add RE*
If you give that command in a directory containing `README.txt` and
`RELEASE-NOTES.txt`, the shell will expand the command to:
$ fossil add README.txt RELEASE-NOTES.txt
…which is compatible with the `fossil add` command’s argument list, which
allows multiple files. Fossil doesn’t see the glob pattern at all, but since
the command does what you almost certainly wanted anyway, it’s fine.
Now consider what happens instead if you say:
$ fossil add --ignore RE* src/*.c
This *doesn’t* do what you want because the shell will expand both `RE*` and
`src/*.c`, causing one of the two files matching the `RE*` glob pattern to be
ignored and the other to be added to the repository. You need to say this in
that case:
$ fossil add --ignore 'RE*' src/*.c
The single quotes force a POSIX shell to pass the `RE*` glob pattern through to
Fossil untouched, which will do its own glob pattern matching. There are other
methods of quoting a glob pattern or escaping its special characters; see your
shell’s manual.
POSIX shells also interpret the same quotation marks Fossil uses to handle
things like spaces in file names, as discussed above. For example, if you
needed to add all files matching `RE*` to the repository except for a file
called `REALLY SECRET STUFF.txt`, you could use nested quotes:
$ fossil add --ignore "'REALLY SECRET STUFF.txt'" RE*
You could instead escape a second set of double quotation marks:
$ fossil add --ignore "\"REALLY SECRET STUFF.txt\"" RE*
It bears repeating that the two glob patterns here are not interpreted the same
way when running this command from a *subdirectory* of the top checkout
directory as when running it at the top of the checkout tree. If these files
were in a subdirectory of the checkout tree called `doc` and that was your
current working directory, the command would have to be:
$ fossil add --ignore "'doc/REALLY SECRET STUFF.txt'" RE*
instead. The Fossil glob pattern still needs the `doc/` prefix because Fossil
always interprets glob patterns from the base of the checkout directory, not
from the current working directory as POSIX shells do.
## Windows
Neither standard Windows command shell — `cmd.exe` or PowerShell —
expands glob patterns the way POSIX shells do. Windows command shells rely on the
command itself to do the glob pattern expansion. The way this works depends on several
factors:
* the version of Windows you’re using
* which OS upgrades have been applied to it
* the compiler that built your Fossil executable
* whether you’re running the command interactively
* whether the command is built against a runtime system that does this at all
* whether the Fossil command is being run from a file named `*.BAT` vs being
named `*.CMD`
* the phase of the moon and whether this is an odd-numbered Thursday. (No,
not really, but the other caveats are all true. Yay, Windows!)
These factors also affect how a program like `fossil.exe` interprets quotation
marks on its command line.
The fifth item above doesn’t apply to `fossil.exe` when built with typical tool
chains, but we’ll see an example below where the exception applies in a way
that affects how Fossil interprets the glob pattern.
The most common problem is figuring out how to get a glob pattern passed on the
command line into `fossil.exe` without it being expanded by the C runtime
library that your particular Fossil executable is linked to, which tries to act
like the POSIX systems described above. Windows is not strongly governed by
POSIX, so it has not historically hewed closely to its strictures.
(This section does not cover the [Microsoft POSIX
subsystem](https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem), Windows’
obsolete [Services for Unix
3.*x*](https://en.wikipedia.org/wiki/Windows_Services_for_UNIX) feature, or the
[Windows Subsystem for
Linux](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). (The latter
is sometimes incorrectly called “Bash on Windows” or “Ubuntu on Windows.”) See
the POSIX Systems section above for those cases.)
For example, consider how you would set `crlf-glob` to `*`. The naïve
approach will not work:
c:\...> fossil setting crlf-glob *
The C runtime library will expand that to the list of all files in the current
directory, which will probably cause a Fossil error because Fossil expects
either “`global`” or nothing after command line parameter giving the setting’s
new value. If you happened to run this in a directory with two files, one of
which was called `global`, it might appear to work but do the wrong thing,
depending on whether the `global` file name was expanded first or second.
Let’s try again:
c:\...> fossil setting crlf-glob '*'
That may or may not work, depending on the factors listed above. On one system
where this was tested, it failed because the command shell sees that no file in
the current directory matches the glob pattern `'*'`, so the command shell
passed those three characters unchanged to `fossil.exe`, which stored them
as-is. Then when Fossil went to apply that glob pattern to file names, it saw
that the glob pattern is quoted, so it didn’t interpret `*` as meaning “any
series of characters;” the quotes made Fossil skip the “looks like a text file”
rules only for a file called exactly `'*'` rather than what we wanted, which
was to skip those rule checks for all files at the top of the checkout
directory.
An approach that *will* work reliably is:
c:\...> echo * | fossil setting crlf-glob --args -
This works because the built-in command `echo` does not expand its arguments,
and the global Fossil option `--args` makes it read further command arguments
from `-`, meaning Fossil’s standard input, which is connected to the output of
`echo` by the pipe.
Another correct approach is:
c:\...> fossil setting crlf-glob *,
This works because the trailing comma prevents the command shell from matching
any files, unless you happen to have files named with a trailing comma in the
current directory. If the pattern matches no files, it is passed into Fossil’s
`main()` function as-is by the C runtime system. Since Fossil uses commas to
separate multiple glob patterns, this means “all files at the root of the
Fossil checkout directory and nothing else.”
————————————————————
Feel free to eliminate my snark in the final bullet item. :)
Also double-check my Windows section rewrites. I tried some of it here, but my
Windows-fu is weaker than my POSIX-fu.
A signed contributor agreement form is in the mail.
I think I'll pause here for tonight, and let you see the hash I made to
the top half of the document. I'll read your re-write of the platform
quirks more carefully after dinner. But perhaps, given the scale of that
edit, you should get credit in the change log for writing it. Richard
shouldn't have any qualms about giving you checkin privileges....
--
Ross Berteig r...@cheshireeng.com
Cheshire Engineering Corp. http://www.CheshireEng.com/
+1 626 303 1602
_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev