GWicke has uploaded a new change for review. https://gerrit.wikimedia.org/r/74527
Change subject: Bug 51457: Avoid some table attribute parsing backtracking ...................................................................... Bug 51457: Avoid some table attribute parsing backtracking Parsing of table cell attributes in the row syntax involves a lot of backtracking. This patch avoids some of that backtracking by applying a conservative heuristic for valid attribute names first. This drops the parse time on pathological pages like http://el.wikipedia.org/wiki/%CE%A0%CE%BF%CF%81%CE%B5%CE%AF%CE%B1_%CF%84%CF%89%CE%BD_%CE%BA%CF%85%CF%80%CF%81%CE%B9%CE%B1%CE%BA%CF%8E%CE%BD_%CE%BF%CE%BC%CE%AC%CE%B4%CF%89%CE%BD_%CF%83%CF%84%CE%B1_%CE%BA%CF%8D%CF%80%CE%B5%CE%BB%CE%BB%CE%B1_%CE%95%CF%85%CF%81%CF%8E%CF%80%CE%B7%CF%82 from hours to a few minutes. It is likely not yet a complete fix for bug 51457, but at least avoids blocking workers for hours. TODO: - Investigate whether there is a bug in memoization keys based on syntax flags - Further speed up attribute parsing, in particular for heavily quoted values like | '''foo''' || '''bar''' || .. Change-Id: I093a9d908097835c3797b746b41d22e16f9251af --- M js/lib/pegTokenizer.pegjs.txt 1 file changed, 7 insertions(+), 2 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/Parsoid refs/changes/27/74527/1 diff --git a/js/lib/pegTokenizer.pegjs.txt b/js/lib/pegTokenizer.pegjs.txt index f30bce6..96e8ee8 100644 --- a/js/lib/pegTokenizer.pegjs.txt +++ b/js/lib/pegTokenizer.pegjs.txt @@ -1724,7 +1724,10 @@ generic_attribute_name = & { return stops.push( 'equal', true ); } - name:attribute_preprocessor_text_line + // quick sanity check before expensive attribute_preprocessor_text_line + // production + &([a-zA-Z] / [{!+] [^\n>/=]+ [=>]/ '<' ('noinclude' / 'onlyinclude' / 'includeonly')) + name:attribute_preprocessor_text_line { stops.pop( 'equal' ); //console.warn( 'generic attribute name: ' + pp( name ) ); @@ -1948,7 +1951,9 @@ table_row_tag = //& { console.warn("table row enter @" + input.substr(pos, 30)); return true; } p:pipe dashes:"-"+ - a:generic_attribute* + a:(generic_attribute / + // Ignore pipes in tr attributes + space* c:'|' { return new KV(c, '') })* tagEndPos:({return pos;}) // handle tables with missing table cells after a row td:implicit_table_data_tag? -- To view, visit https://gerrit.wikimedia.org/r/74527 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I093a9d908097835c3797b746b41d22e16f9251af Gerrit-PatchSet: 1 Gerrit-Project: mediawiki/extensions/Parsoid Gerrit-Branch: master Gerrit-Owner: GWicke <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
