Hey Subbu, Is there an easy way to determine whether or not my extensions are using parser hooks? For example, a canonical list of hooks I can grep for in my code?
On Mon, Sep 14, 2020 at 1:17 PM Subramanya Sastry <[email protected]> wrote: > [---- Long mail - but only relevant to extension developers ----] > > Greetings! > > As some of you might know, on the Parsing Team [0], we are aspiring to > replace the core wikitext parser with Parsoid [1] on Wikimedia wikis late > next year and start to put to rest the two-parser ghost that has haunted us > for many years. In recent years, we achieved two major milestones along > the way: replace HTML4 tidy with HTML5 Remex [2], and port Parsoid from > Javascript to PHP [3]. > > Given that context, if you (help) maintain an extension that: > > * uses a "parser hook" and/or > * uses the "parser API" (i.e. uses public properties / methods in > Parser.php, ParserOutput.php, ParserOptions.php, etc.) > > please read on. If you don't fit that description, you can stop reading > now! > > Parsoid models and processes wikitext quite differently from the > core parser - all that Parsoid guarantees is that the rendering is largely > identical, not the specific process of generating the rendering. This > means that extensions that extend the behavior of the parser will need to > adapt to work with Parsoid instead to provide similar functionality. With > that in mind, we have been working to more clearly specify how extensions > need to adapt to the Parsoid regime. > > PARSOID & EXTENSIONS: > > At a high level, here are the questions we needed to answer, along with > some highly simplified answers: > > 1. How do extensions "hook" into Parsoid? > A. Extensions need to think in terms of transformations (convert this > to that) instead of parser pipeline events (at this point in the > pipeline, call this listener). An additional detail here is that > extensions cannot maintain global ordered state within extension code > since Parsoid doesn't guarantee handlers will be invoked in the same > order in which they showed up in page source. See the wiki [4] for > more details. > > As for the mechanics of registration, Parsoid uses existing mechanisms > based on the extension.json file. > > 2. When the registered hook listeners are invoked by Parsoid, how do they > process any wikitext they need to process? > A. Parsoid provides all registered listeners with an API object to interact > with it. Direct use of Parsoid internals code is strongly discouraged > and will be enforced in various ways including via code review. > > 3. How is the extension's output assimilated into the page output? > A. The output is treated as a "fully-processed" page/DOM fragment (with > some caveats which will be clarified on wiki). It is appropriately > decorated with additional markup, and slotted into place into the page. > Extensions need not make any special efforts (aka strip state) to > protect it from the parsing pipeline. > > Slides 8-12 of the August 12 2020 Tech Talk [7] goes over the differences. > Check the wiki [4] for more details of Parsoid's Extension API. It also > maps core parser hooks to Parsoid's extension functionality. > > CURRENT STATUS: > > We consider the current proposal to be in late draft stage. That said, as > we discover unsupported functionality, we will augment the set of hooks and > the Parsoid Extension API as needed. > > While there are a wide variety of extensions in the MediaWiki universe > with varied use cases, our initial goal for the next year is just Wikimedia > wikis and hence extensions that are deployed on the Wikimedia wikis. > Once we are done with that, we will turn our attention to supporting > extension use cases in the wider MediaWiki universe. But, now is a > good time for all extension developers to study and review this API > and give us feedback. > > Since the beginning of this year, we've refactored all of the extensions > we've written Parsoid versions of (Cite, Gallery, Poem, Pre, JSON) to > now strictly use the Parsoid Extension API without cheating by virtue > of being in the Parsoid codebase. So, this proposal is actually backed > by an implementation that is in production for Wikimedia wikis. > > FEEDBACK: > > Here is where you come in. > > * If you maintain / develop an extension, please review the document > to see if your extension's use case is covered. > > Ideally, leave your feedback on the Parsoid Extension API talk page [5] > since it helps keep it all in one place. Alternatively, you can also > leave questions / concerns / other feedback on the Phabricator task > we've filed for TechCom's RFC process [6]. > > * If you feel bold, start the process of updating your extensions *now*. > Note that your extension will need to operate with both the existing > core parser as well as Parsoid till such time we deprecate and stop > using the core parser. > > There are known functionality gaps related to exposing ParserOutput > object and providing setFunctionHook functionality. If your extension > needs those, you should probably wait for us to fill that gap. > > DOCS / MORE INFO / CONTACT: > > * Check the wiki page [4] for docs and discuss on the talk page [5] > * Check the August 12, 2020 Tech Talk [7] > * Look at Parsoid code for extensions [8] > * Look at Parsoid docs for the Ext/ namespace [9] > * Talk to us on IRC in the #mediawiki-parsoid channel > * Email us at [email protected] > > Thanks! > Subbu (on behalf of the Parsing Team). > > ------------------------------------------------------------------------- > > 0. https://www.mediawiki.org/wiki/Parsing > 1. https://www.mediawiki.org/wiki/Parsing/Parser_Unification > 2. https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/ > 3. > > https://techblog.wikimedia.org/2020/02/12/parsoid-in-php-or-there-and-back-again/ > > 4. https://www.mediawiki.org/wiki/Parsoid/Extension_API > 5. https://www.mediawiki.org/wiki/Parsoid/Talk:Extension_API > 6. https://phabricator.wikimedia.org/T260714 > 7. Slides: > > https://commons.wikimedia.org/wiki/File:Parsoid_%26_Extensions_August_2020_Tech_Talk.pdf > > Video: https://www.youtube.com/watch?v=lS1xPkERWCM > 8. https://github.com/wikimedia/parsoid/tree/master/src/Ext > 9. https://doc.wikimedia.org/Parsoid-PHP/master/ > _______________________________________________ > MediaWiki-l mailing list > To unsubscribe, go to: > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
