lmhelp wrote:
> 
> Hi,
> 
> Thank you or reading my post.
> 
> I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
> language (or an *exhaustive* (and formal) set of rules about how is
> constructed 
> a "Wikitext"). 
> I've looked for such a grammar/set of rules on the Web but I couldn't find
> one...

No. But see http://www.mediawiki.org/wiki/Markup_spec for grammars which
"kind of work".

> I need to extract automatically the first paragraph of a Wiki article...
> 
> I did it from the HTML version of a Wiki article (because
> I noticed the first paragraph was the first <p> element
> child of a <div> element which id is "bodyContent"...)
> but I need to work with the "Wikitext" itself...
> 
> - Is a grammar available somewhere?
> - Do you have any idea how to extract the first paragaph of a Wiki article?
> - Any advice?
> - Does a Java "Wikitext" "parser" exists which would do it?

Get the first text before a double new line (\n\n), which is what splits
paragraphs in wikitext.

However, pages commonly begin with templates, so if the page begins with
{{, you would remove everything up to the matching }} (and remove
leading whitespace).


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to