> From: Stefan Monnier <monn...@iro.umontreal.ca> > Cc: rpl...@gmail.com, 23...@debbugs.gnu.org, alex.ben...@linaro.org, > jwieg...@gmail.com, nljlistb...@gmail.com > Date: Tue, 19 Jul 2016 00:48:19 -0400 > > > The more general problem is when there's at least one more > > sub-expression, whose start and/or end are after the new EOB. > > Those sub-expression's data will be completely bogus after the > > adjustment, > > If they were after the EOB, they were already bogus to start with.
I think we are mis-communicating. I mean the following scenario: Before call to replace_range in replace-match: |---------------------------|---|------|----| s1 e1 s2 e2 EOB (s1, e1, etc. are the start and end of the corresponding sub-expressions.) After the call to replace_range in replace-match: |---------|---|------|----| s1 e1 s2 e2 EOB IOW, the 1st sub-expression got replaced with a much shorter text, which made EOB be smaller than the original beginning and end of the 2nd sub-expression. There's nothing bogus with this, is there? The user will expect to get match-data adjusted as shown in the second diagram, and that's what she will really get -- unless there are buffer-modification hooks that use save-match-data. In the latter case, what the user will get instead is this: |---------|---|------|----| s1 EOB e1 s2 e2 and that is even before the adjustment code kicks in and makes "adjustments" with an incorrect adjustment value, which is computed as newpoint = search_regs.start[sub] + SCHARS (newtext); [...] ptrdiff_t change = newpoint - search_regs.end[sub]; and so will use the new EOB as search_regs.end[sub], instead of the correct original value of e1 from the first diagram above. IOW, the call to save-match-data in a buffer-modification hook _disrupts_ the normal operation of replace-match in this case, by indirectly sabotaging the adjustment of match data after the replacement. Am I missing something? > And in any case, that's what has been happening for ever and has > proved safe enough. So you are saying that if a bug has been happening "for ever", it doesn't have to be fixed? (I disagree about "safe enough": the amount of bug reports in our data base that are not reproducible and about whose reasons we have no clear idea is non-negligible, so we don't really know whether this particular issue caused trouble in the past or not.) > >> So I think a safe fix is to try and relax the check we added to > >> replace-match so it doesn't get all worked up when something ≥ EOB gets > >> changed to something else that's also ≥ EOB. > > And lose the other sub-expressions in a more general case? Really? > > I'm not sure what you mean by "losing sub-expressions". See above: the match data for any sub-expressions beyond the one that shrunk too much is now bogus. Thus "losing". > >> Or maybe instead of signaling an error, we could simply skip the "Adjust > >> search data for this change". > > That would still sweep the problem under the carpet, leaving the match > > data bogus, so I don't like doing that. > > Maybe I'm not 100% satisfied with the behavior either, but I don't think > it's a significant problem and I don't think it'd cause the crash we saw > in bug#23869. We don't only solve bugs that cause crashes, do we? When I debugged the crash in bug#23869, I found and tried to solve the root cause of it, not just the symptom that caused the assertion violation. I still think we should strive to solve the root cause. As for the significance of the problem, I hope you will reconsider this after reading the above description of the scenario I have in mind. (Or tell me where I am wrong.) > > The crash in bug#23869 was due to this: > > > > newpoint = search_regs.start[sub] + SCHARS (newtext); > > [...] > > /* Now move point "officially" to the start of the inserted replacement. > > */ > > move_if_not_intangible (newpoint); <<<<<<<<<<<<<<<<<<<<<<< > > > > because due to clobbering, newpoint became -1. > > Ah, I see. > > Then maybe another fix is to compute newpoint before we call > replace_range, so it uses search_regs.start[sub] before the > *-change-functions can mess it up. IOW: > > @@ -2726,9 +2726,9 @@ since only regular expressions have distinguished > subexpressions. */) > unsigned num_regs = search_regs.num_regs; > > /* Replace the old text with the new in the cleanest possible way. */ > + newpoint = search_regs.start[sub] + SCHARS (newtext); > replace_range (search_regs.start[sub], search_regs.end[sub], > newtext, 1, 0, 1); > - newpoint = search_regs.start[sub] + SCHARS (newtext); > > if (case_action == all_caps) > Fupcase_region (make_number (search_regs.start[sub]), > > Would that be sufficient to avoid the crash? Why not? To avoid the crash in that particular use case, yes (but it can be avoided even easier, by validating the value of newpoint before the call to move_if_not_intangible). But what about the root cause? It is a clear bug, and could even cause crashes, if one of the match-data end-point positions becomes negative as result of the bogus adjustment, and someone then uses those positions for something. What's worse, these bugs happen in programs that are completely valid on the Lisp level.