[Wikitech-l] Re: Wikitext, Document Models, and HTML5 Output

Brian Wolff Tue, 11 Jan 2022 19:51:11 -0800

Have you seen the html structure of parsoid?

E.g.
https://en.wikipedia.org/api/rest_v1/page/html/Dog


--
Bawolff
On Monday, January 10, 2022, Adam Sobieski <adamsobie...@hotmail.com> wrote:

> Wikitech-l,
>
>
>
> Hello. I have a question about the HTML output of wiki parsers. I wonder
> about how simple or complex that it would be for a wiki parser to output,
> instead of a flat document structure inside of a <div> element, an
> <article> element containing nested <section> elements?
>
>
>
> Recently, in the Community Wishlist Survey Sandbox
> <https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox>, the
> speech synthesis of Wikipedia articles
> <https://meta.wikimedia.org/wiki/Community_Wishlist_Survey/Sandbox#Spoken_articles>
> was broached. The proposer of these ideas indicated that, for best results,
> some content, e.g., “See also” sections, should not be synthesized.
>
>
>
> In response to these interesting ideas, I mentioned some ideas from EPUB, 
> referencing
> pronunciation lexicons from HTML
> <https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-pls> and SSML
> attributes in HTML
> <https://www.w3.org/publishing/epub3/epub-contentdocs.html#sec-xhtml-ssml-attrib>,
> the CSS Speech Module <https://www.w3.org/TR/css-speech-1/>, and that
> output HTML content could be styled using the CSS Speech Module’s speak
> property.
>
>
>
> In these regards, I started thinking about how one might extend wikitext
> syntax to be able to style sections, e.g.,:
>
>
>
> == See also == {style="speak:never"}
>
>
>
> Next, I inspected the HTML of some Wikipedia articles and realized that,
> due to the structure of the output HTML documents, it isn’t simple to style
> or to add attributes to sections. There are only <h2>, <h3>, <h4> (et
> cetera) elements inside of a containing <div> element; sections are not
> yet structured elements.
>
>
>
> The gist is that, instead of outputting HTML like:
>
>
>
> <div class="mw-parser-output">
>
>   <h2><span class="mw-headline" id="Heading">Heading</span></h2>
>
>   <p>Paragraph 1</p>
>
>   <p>Paragraph 2</p>
>
>   <h3><span class="mw-headline" id="Subheading">Subheading</span></h3>
>
>   <p>Paragraph 3</p>
>
>   <p>Paragraph 4</p>
>
> </div>
>
>
>
> could a wiki parser output HTML5 like:
>
>
>
> <article class="mw-parser-output">
>
>   <section id="Heading">
>
>     <header><h2><span class="mw-headline">Heading</span></h2></header>
>
>     <p>Paragraph 1</p>
>
>     <p>Paragraph 2</p>
>
>     <section id="Subheading">
>
>       <header><h3><span class="mw-headline">Subheading</span></h3></
> header>
>
>       <p>Paragraph 3</p>
>
>       <p>Paragraph 4</p>
>
>     </section>
>
>   </section>
>
> </article>
>
>
>
> Initial thoughts regarding the latter HTML5 include that it is better
> structured, more semantic, more styleable, and potentially more accessible.
> If there is any interest, I could write up some lengthier discussion about
> one versus the other, why one might be better – and more useful – than the
> other.
>
>
>
> Is this the correct mailing list to discuss any of these wiki technology,
> wiki parsing, wikitext, document model, and HTML5 output topics?
>
>
>
>
>
> Best regards,
>
> Adam
>
>
>

_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Wikitext, Document Models, and HTML5 Output

Reply via email to