"Austin T. Clements" <aclements at csail.mit.edu> writes: > Quoting Moritz Ulrich <moritz at tarn-vedra.de>: >> Hello, >> >> I recently adopted notmuch as my primary way to read mail, so thank you >> for this great tool! >> >> Unfortunately, I ran into a problem of the Emacs side of the project >> when used in a non-ascii environment: >> >> Having a tag named 'uni-k?ln', the tag:-completion doesn't work. >> >> This is caused by `notmuch-escape-boolean-term' errornously escaping the >> above string: >> >> (notmuch-escape-boolean-term "uni-k?ln") => "\"uni-k?ln\"" >> >> This is caused by `string-match' with the following errornously matching >> my tag: >> >> (string-match "[^!#-'*-~]" "uni-k?ln") => 5 >> (string-match "[^!#-'*-~]" "uni-koln") => nil >> >> I'm not exactly sure how to tackle this - the Regexp was crafted to match >> (, ), " if I understand it correct. A simple way would be just adding >> more characters as a sort-of whitelist. A nicer solution would be >> converting it from [^...] to [...] to explicitly mark letters that needs >> to be escaped. > > notmuch-escape-boolean-term used to use a blacklist, but we switched > to a whitelist because Xapian's own parser has changed over the years > in its handling of non-ASCII characters and invalidated our blacklist. > Ultimately it seemed much safer to go with a whitelist. Quoting > "uni-k?ln" isn't erroneous, it's just conservative. > > Could you explain in more detail what's broken? I tried adding the > tag uni-k?ln to a message in Emacs, then hitting "s" to start a search > then "tag:<TAB>" and that tag (surrounded by quotes) was one of the > completion options. Upon completing to that tag, the search worked > fine. > > Are you objecting to the unnecessary (but legal) quotes in the > completion? We might be able to include Unicode word characters in > the quoting whitelist, though that seems like a spot fix (probably a > fairly broad one, so maybe that's fine) and might be tricky because of > Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might > Just Work, but we'd have to be careful of the active syntax table). > Or tab completion could recognize that, say, tag:uni doesn't require > quoting, but still expand it to tag:"uni-k?ln".
Thanks for explaining the reason for the whitelist-approach. Knowing this is quite helpful. I can't really explain why, but I just didn't notice tag:"uni-k?ln" in the tag-completion - I think my expectations for finding it as tag:uni-k?ln must have blinded me. While it isn't errornous, it's higly unintuitive to quote tags like this. I can understand that a much more permissive whitelist could cause other problems which are harder to track down, so maybe it's possible to make the behavior configurable (e.g. by using a `defvar' for the regex). -- Moritz Ulrich -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 818 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20140812/bec926a6/attachment.pgp>