On Mon, 09 Jan 2017 23:33:17 -0500, Matt Harbison <mharbiso...@gmail.com> wrote:

On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David <pierre-yves.da...@ens-lyon.org> wrote:



On 01/08/2017 09:34 PM, Matt Harbison wrote:
On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
<pierre-yves.da...@ens-lyon.org> wrote:

(ha, I wrote my previous reply in a train and it got sent when I
connected again (and received that one). I'm going to try to adress
the new content in this email and sometime repeat some of my other
reply content for clarity)

On 01/08/2017 04:23 AM, Matt Harbison wrote:
On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara <y...@tcha.org>
wrote:

On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:
> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
<pierre-yves.da...@ens-lyon.org> wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbi...@yahoo.com>
>> # Date 1483550016 18000
>> #      Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>

Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility. You
can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I

Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals). But you seem to
be suggesting adding 'icaseliteral:'.

I'm not 100% sure of what Yuya actually has in mind but here is my
understanding of the situation and how we could move forward.

Currently:
----------

   desc(X) → X is customly matched as a case insensitive litteral,

   We have a "generic" pattern definition syntax used by various other
reveset (implemented in "stringmatcher")

     foo(X)
       → X is matched as a case sensitive litteral
     foo('literal:X')
       → X is matched as a case sensitive literal (same as the above)
     food('re:X')
       → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
---------

extend the string matcher to

   foo('literal:X')
     → X is matched as a case sensitive literal

See the comment in the new patch I sent about 'user()' already
lowercasing 'literal:' and 're:'.  I'd consider it a bug, but it's been
in since mid 2012.  Attempting to channel Matt, I'm guessing we are
stuck with that since it is so old, but wanted to see what others think.

1) Yep, we are stuck with whatever existing behavior we have for existing predicate because of BC. (but we can augment it)

2) Congratulation you seems to have unearthed an area where we have many predicated with close but slightly different behavior. At that point I'll ask you an inventory of what we currently have so that we can devise a sound and as consistent as possible way forward.

   Can you provide us with a table that at least keep track of:

* predicate
* default behavior
* support 'rich' stringmatcher ?
* are 'literal:' case sensitive ?
* are 're:' case sensitive (and supported at all) ?

TL;DR: desc, grep, keyword, and author/user are the oddballs. grep and keyword are the regex and literal halves respectively, of the same search.

After stripping out non string predicates, we basically end up with 4 groups:

- util.stringmatcher based:

     Predicate:            Case Sensitive?
     "author"                   N
     "user"                     N
     "bookmark"                 Y
     "branch"                   Y
     "extra"                    Y
     "named"                    Y
     "subrepo"  [1]             Y
     "tag"                      Y

These all support 'literal:' and 're:'. Case sensitivity applies the same to both prefixes, and raw pattern.

[1] Not documented to support 'literal:' or 're:' prefixes.


- Local method implementation based:

     Predicate:            Case Sensitive?
     "desc"                     N
     "grep"                     Y
     "keyword"                  N

None of these support prefixes. The grep param is a regex without 're:', so it doesn't make a lot of sense to support stringmatcher here- what stringmatcher thinks is literal is really regex. If we internally bolt on 're:', it still can't support literal matches.


- match.py based (not relevant, but for completeness):

     "adds", "contains", "file", "filelog", "follow",
     "followlines", "modifies", "removes"

- "bisect" (I can't see how 're:' support here would be meaningful.)


From there we'll be able to see if a pattern emerge and pick the best way to move forward.

The following thoughts come to mind:

A) author/user has been thoroughly corrupted with the lower casing. Maybe come up with a 3rd one that follows modern rules, and deprecate/hide these? Sad, because these seem natural. OTOH, having raw pattern and prefixed patterns behave the same everywhere is elegant and simple. Not sure what to call it. 'committer' comes to mind, but that has git implications. That was suggested when I proposed recording who performed a graft in the 'extra' dict. Then it was pointed out this tracking was related to the chain of custody proposal by Greg, so I let it drop [1]. My enthusiasm was dampened some when I saw templates also have {author} and {user}, though the latter is a filter, not a reference to the field.

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-April/068692.html

B) We should maybe fold grep and keyword into a new predicate ('search'?) that follows modern formatting, and deprecate/hide these two. Simple, and drops the visible revset count by 1. I'm not sure if 'grep' was borrowed from git, or if it's just inspired by the unix command.

I had second thoughts about this. I think if we just add documentation to 'keyword' that says "If you want to search these fields by regex or case sensitively, use 'grep'". Then the user can see how to get everything stringmatcher will provide, and it makes it clear to developers that 'keyword' doesn't need string stringmatcher support. I'll submit a patch during the freeze.

That leaves only 'author' and 'desc' as not providing the full functionality of the others.


C) If we do A + B, that means 'desc' is the only oddball left. I don't like the idea that case sensitivity for a raw pattern and a 'literal:' prefixed pattern would differ. They are both literals in my mind, and it would be the one remaining exception. The 're:' prefix could follow regular rules.

D) I guess we could hide/replace 'desc' with something new as well. ('message'?) But I'm wondering about the BC rules. I think generally, we don't change behavior without the user doing some action to opt in. But the 're:' prefix was added after many of these predicates existed for years. Obviously, that broke anybody's search that started with 're:', just by them upgrading Mercurial. If making them change the query to prefix 'literal:' was acceptable, why would changing the case sensitivity not be, assuming we provide 'icase-literal:'? Is it just a general feeling that searches for 're:' are rare? I understand the principle and motivation, but sometimes it's hard to figure out how they are applied.


   foo('icase-literal:X')
     → X is matched as a case insensitive literal
   food('re:X')
     → X is matched as a regular expression (case sensitive)

Then, desc move to use string matcher (default to "icase-literal").

We do not need a 'icase-re:' spec, because one can easily achieve it
using 're:(?i)foo'

Ah! I missed the part in the docs where flags could be set in the string
with (?<flag>). I thought you needed to compile with re.FLAG.  When he
said string literal, my mind went right to the 'literal:' prefix.
Agreed, no need for 'icase-re:'.

Someone getting slightly confused with regular expression? Impossible! ;-)

[…]
I'm about to submit a patch to add the current 're:' support to 'desc'
in the meantime, to hopefully move this along.

Great!

 I'd also be curious if
you have thoughts on how to conditionally limit this predicate to the
first line, without limiting future functionality.

So having digged the regexp part a bit more, it seems like one could
just use 're:^.*issue1337' to match "issue1337" on the first line ('.'
does not match new-line by default).

Thanks for looking at that.  It's way less horrible than I thought it
would be.  I'm curious what Sean thinks, since he mentioned {firstline}
being put in as a substitute for a complex regex.  I'd be fine with
skipping the firstline=True param if this case is mentioned in the help
for desc().

I've missed that '{firstline}' proposal from Sean, can you point me at it? (or summarize it ?)

Not a proposal, so much as historical knowledge about the template?

https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-January/092099.html

Thanks a lot for looking into this!

Cheers,
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to