GWicke has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/74527


Change subject: Bug 51457: Avoid some table attribute parsing backtracking
......................................................................

Bug 51457: Avoid some table attribute parsing backtracking

Parsing of table cell attributes in the row syntax involves a lot of
backtracking. This patch avoids some of that backtracking by applying a
conservative heuristic for valid attribute names first.

This drops the parse time on pathological pages like
http://el.wikipedia.org/wiki/%CE%A0%CE%BF%CF%81%CE%B5%CE%AF%CE%B1_%CF%84%CF%89%CE%BD_%CE%BA%CF%85%CF%80%CF%81%CE%B9%CE%B1%CE%BA%CF%8E%CE%BD_%CE%BF%CE%BC%CE%AC%CE%B4%CF%89%CE%BD_%CF%83%CF%84%CE%B1_%CE%BA%CF%8D%CF%80%CE%B5%CE%BB%CE%BB%CE%B1_%CE%95%CF%85%CF%81%CF%8E%CF%80%CE%B7%CF%82
from hours to a few minutes.

It is likely not yet a complete fix for bug 51457, but at least avoids
blocking workers for hours.

TODO:
- Investigate whether there is a bug in memoization keys based on syntax
  flags
- Further speed up attribute parsing, in particular for heavily quoted values
  like | '''foo''' || '''bar''' || ..

Change-Id: I093a9d908097835c3797b746b41d22e16f9251af
---
M js/lib/pegTokenizer.pegjs.txt
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/Parsoid 
refs/changes/27/74527/1

diff --git a/js/lib/pegTokenizer.pegjs.txt b/js/lib/pegTokenizer.pegjs.txt
index f30bce6..96e8ee8 100644
--- a/js/lib/pegTokenizer.pegjs.txt
+++ b/js/lib/pegTokenizer.pegjs.txt
@@ -1724,7 +1724,10 @@
 
 generic_attribute_name
   = & { return stops.push( 'equal', true ); }
-  name:attribute_preprocessor_text_line
+    // quick sanity check before expensive attribute_preprocessor_text_line
+    // production
+    &([a-zA-Z] / [{!+] [^\n>/=]+ [=>]/ '<' ('noinclude' / 'onlyinclude' / 
'includeonly')) 
+    name:attribute_preprocessor_text_line
     {
         stops.pop( 'equal' );
         //console.warn( 'generic attribute name: ' + pp( name ) );
@@ -1948,7 +1951,9 @@
 table_row_tag
   = //& { console.warn("table row enter @" + input.substr(pos, 30)); return 
true; }
     p:pipe dashes:"-"+
-    a:generic_attribute*
+    a:(generic_attribute / 
+            // Ignore pipes in tr attributes
+            space* c:'|' { return new KV(c, '') })*
     tagEndPos:({return pos;})
     // handle tables with missing table cells after a row
     td:implicit_table_data_tag?

-- 
To view, visit https://gerrit.wikimedia.org/r/74527
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I093a9d908097835c3797b746b41d22e16f9251af
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/Parsoid
Gerrit-Branch: master
Gerrit-Owner: GWicke <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to