Subramanya Sastry has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/180563

Change subject: Tweak <nowiki/> removal heuristic a bit
......................................................................

Tweak <nowiki/> removal heuristic a bit

* Looks like there are a lot of wikitext scenarios like this in
  roundtrip testing.

'<nowiki/>''foo'' and ''[[bar]]''

  Our current conservative heuristic won't strip the nowiki in that
  scenario. So, add another hacky heuristic for now. We really need
  a line-based heuristic that can examine wikitext chunks that were
  emitted and distinguish between output chunks.

  That is coming later as part of what Scott is working on.

  For now, this should help us minimize regressions.

Change-Id: I2759e76d56703254d3907ac447644457bc007b4b
---
M lib/mediawiki.WikitextSerializer.js
1 file changed, 5 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/services/parsoid 
refs/changes/63/180563/1

diff --git a/lib/mediawiki.WikitextSerializer.js 
b/lib/mediawiki.WikitextSerializer.js
index 79ba9bd..ea45bf7 100644
--- a/lib/mediawiki.WikitextSerializer.js
+++ b/lib/mediawiki.WikitextSerializer.js
@@ -1220,16 +1220,19 @@
        // Within the matched quote-segments, be conservative and don't match 
higher-priority
        // parser characters like [{< -- used for links and templates. This 
should prevent
        // inadvertent matching up across links/templates/tags.
-       var testRE = 
/^[^']+$|^[^']*(('''''[^\[\{<']+'''''|'''[^\[\{<']+'''|''[^\[\{<']+''|')([^']+|$))+('|$)$/;
+       var testRE = 
/^[^']+$|^[^']*(('''''(\[\[\w+\]\]|[^\[\{<']+)'''''|'''(\[\[\w+\]\]|[^\[\{<']+)'''|''(\[\[\w+\]\]|[^\[\{<']+)''|')([^']+|$))+('|$)$/;
 
        return wt.split(/\n|$/).map(function(line) {
+               if (!/<nowiki\/>/.test(line)) {
+                       return line;
+               }
+
                // * Strip out nowiki-protected strings since we are only 
interested in
                //   quote sequences that correspond to <i>/<b> tags.
                // * Find segments separated by <nowiki/>s.
                // * If all the segments contain balanced i/b tags, and the 
<nowiki/>
                //   separated a quote and an i/b tag, we can remove all the 
<nowiki/>s
                var pieces = line.replace(/<nowiki>.*?<\/nowiki>/g, 
'').split(/<nowiki\/>/);
-
                var n = pieces.length;
                for (var i = 0; i < n; i++) {
                        if (!testRE.test(pieces[i]) ||

-- 
To view, visit https://gerrit.wikimedia.org/r/180563
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2759e76d56703254d3907ac447644457bc007b4b
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/services/parsoid
Gerrit-Branch: master
Gerrit-Owner: Subramanya Sastry <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to