[O] bug#23917: Please consider making Bug #23917 a blocker for 25.1 (was Re: org-capture: Capture template ‘g’: Match data clobbered by buffer modification hooks)

Eli Zaretskii Tue, 19 Jul 2016 08:41:26 -0700

> From: Stefan Monnier <[email protected]>
> Cc: [email protected],  [email protected],  [email protected],  
> [email protected],  [email protected]
> Date: Tue, 19 Jul 2016 00:48:19 -0400
> 
> > The more general problem is when there's at least one more
> > sub-expression, whose start and/or end are after the new EOB.
> > Those sub-expression's data will be completely bogus after the
> > adjustment,
> 
> If they were after the EOB, they were already bogus to start with.


I think we are mis-communicating.  I mean the following scenario:

Before call to replace_range in replace-match:

   |---------------------------|---|------|----|
   s1                         e1   s2    e2   EOB

(s1, e1, etc. are the start and end of the corresponding
sub-expressions.)

After the call to replace_range in replace-match:

   |---------|---|------|----|
   s1       e1   s2    e2   EOB

IOW, the 1st sub-expression got replaced with a much shorter text,
which made EOB be smaller than the original beginning and end of the
2nd sub-expression.  There's nothing bogus with this, is there?  The
user will expect to get match-data adjusted as shown in the second
diagram, and that's what she will really get -- unless there are
buffer-modification hooks that use save-match-data.  In the latter
case, what the user will get instead is this:

   |---------|---|------|----|
   s1                       EOB
                            e1
                            s2
                            e2

and that is even before the adjustment code kicks in and makes
"adjustments" with an incorrect adjustment value, which is computed as

  newpoint = search_regs.start[sub] + SCHARS (newtext);
  [...]
  ptrdiff_t change = newpoint - search_regs.end[sub];

and so will use the new EOB as search_regs.end[sub], instead of the
correct original value of e1 from the first diagram above.

IOW, the call to save-match-data in a buffer-modification hook
_disrupts_ the normal operation of replace-match in this case, by
indirectly sabotaging the adjustment of match data after the
replacement.

Am I missing something?

> And in any case, that's what has been happening for ever and has
> proved safe enough.

So you are saying that if a bug has been happening "for ever", it
doesn't have to be fixed?  (I disagree about "safe enough": the amount
of bug reports in our data base that are not reproducible and about
whose reasons we have no clear idea is non-negligible, so we don't
really know whether this particular issue caused trouble in the past
or not.)

> >> So I think a safe fix is to try and relax the check we added to
> >> replace-match so it doesn't get all worked up when something ≥ EOB gets
> >> changed to something else that's also ≥ EOB.
> > And lose the other sub-expressions in a more general case?  Really?
> 
> I'm not sure what you mean by "losing sub-expressions".

See above: the match data for any sub-expressions beyond the one that
shrunk too much is now bogus.  Thus "losing".

> >> Or maybe instead of signaling an error, we could simply skip the "Adjust
> >> search data for this change".
> > That would still sweep the problem under the carpet, leaving the match
> > data bogus, so I don't like doing that.
> 
> Maybe I'm not 100% satisfied with the behavior either, but I don't think
> it's a significant problem and I don't think it'd cause the crash we saw
> in bug#23869.

We don't only solve bugs that cause crashes, do we?  When I debugged
the crash in bug#23869, I found and tried to solve the root cause of
it, not just the symptom that caused the assertion violation.  I still
think we should strive to solve the root cause.

As for the significance of the problem, I hope you will reconsider
this after reading the above description of the scenario I have in
mind.  (Or tell me where I am wrong.)

> > The crash in bug#23869 was due to this:
> >
> >   newpoint = search_regs.start[sub] + SCHARS (newtext);
> >   [...]
> >   /* Now move point "officially" to the start of the inserted replacement.  
> > */
> >   move_if_not_intangible (newpoint);  <<<<<<<<<<<<<<<<<<<<<<<
> >
> > because due to clobbering, newpoint became -1.
> 
> Ah, I see.
> 
> Then maybe another fix is to compute newpoint before we call
> replace_range, so it uses search_regs.start[sub] before the
> *-change-functions can mess it up.  IOW:
> 
>     @@ -2726,9 +2726,9 @@ since only regular expressions have distinguished 
> subexpressions.  */)
>        unsigned  num_regs = search_regs.num_regs;
>     
>        /* Replace the old text with the new in the cleanest possible way.  */
>     +  newpoint = search_regs.start[sub] + SCHARS (newtext);
>        replace_range (search_regs.start[sub], search_regs.end[sub],
>                  newtext, 1, 0, 1);
>     -  newpoint = search_regs.start[sub] + SCHARS (newtext);
>      
>        if (case_action == all_caps)
>          Fupcase_region (make_number (search_regs.start[sub]),
> 
> Would that be sufficient to avoid the crash?  Why not?

To avoid the crash in that particular use case, yes (but it can be
avoided even easier, by validating the value of newpoint before the
call to move_if_not_intangible).  But what about the root cause?  It
is a clear bug, and could even cause crashes, if one of the match-data
end-point positions becomes negative as result of the bogus
adjustment, and someone then uses those positions for something.
What's worse, these bugs happen in programs that are completely valid
on the Lisp level.

[O] bug#23917: Please consider making Bug #23917 a blocker for 25.1 (was Re: org-capture: Capture template ‘g’: Match data clobbered by buffer modification hooks)

Reply via email to