Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread lmhelp2
Hi Axel, Thank you for your answer. I am wondering... how do you explain that the two templates {{Guil|'''parti philosophique'''}} and {{s-|XVIII|e|}} in my example are not processed correctly (by default) (*)? Is it because Bliki works correctly with English wiki articles and not with, for

Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread BPJ
2010-08-07 20:24, lmhelp skrev: So why not use the real parser? Exactly. Where can it be found, please? Thanks and all the best, -- Lmhelp fetch the html from wikipedia.org with something like wget (playing nicely and using delays!) and then extract the first p element with something

Re: [Mediawiki-l] Wikitext grammar

2010-08-08 Thread Axel
On Sun, Aug 8, 2010 at 9:49 PM, lmhelp lm...@wanadoo.fr wrote: Hi, I have abandonned Bliki because look what happenned: Here is what I gave to Bliki as an input: --- Le {{Guil|'''parti philosophique'''}} désignait

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
Thank you all for your contribs :). Hi, So... I was over-optimistic about managing to extract the first paragraph of a Wikipedia article out of its Wikitext easily... Yet, I managed (1) for instance (for the Wikipedia article Čokot) to get the following Wikitext sentence:

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread Brian J Mingus
On Sat, Aug 7, 2010 at 9:21 AM, lmhelp lm...@wanadoo.fr wrote: MY FIRST QUESTION IS: = I was wondering if you knew a better tool than this one... one which wouldn't miss some Wikitext chunks of code like in the above example (or maybe which at least would handle usual

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread Brian J Mingus
On Sat, Aug 7, 2010 at 10:54 AM, lmhelp lm...@wanadoo.fr wrote: Hi, Thank you for your answer. mwlib is the best parser available for folks who want to do a quick job such as yours. Maybe it is, I don't know... I know (since recently) it is not an easy task constructing a parser for

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
- mwlib was written in conjunction with the WMF, and IIRC had at least some input from Brion Vibber. It's high quality and works well. There is a 2-3 hour learning curve for navigating the python modules and methods

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
So why not use the real parser? Exactly. Where can it be found, please? Thanks and all the best, -- Lmhelp -- View this message in context: http://old.nabble.com/Wikitext-grammar-tp29350471p29376156.html Sent from the WikiMedia General mailing list archive at Nabble.com.

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread nevio carlos de alarcão
If you are to extract only Wikipedia'a articles first paragraph no problema. 2010/8/6 Katharina Wolkwitz wolkw...@fh-swf.de Hi, Am 05.08.2010 16:47 schrieb lmhelp2: Thank you! So here is the list I have for the moment: I need to ignore lines: - containing: {{...}} =

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Magnus Manske
Also ignore lines starting with #, :, (space), or ; . Then there are (potentially nested) tables, which start with a line beginning with {| and end in a line beginning with |}. There are more magic words with the general pattern __SOMEUPPERCASECHARACTERS__, IIRC. Note that sometimes, people

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Wed, Aug 4, 2010 at 1:45 PM, lmhelp lm...@wanadoo.fr wrote: I need to extract automatically the first paragraph of a Wiki article... See Extracted page extracts for Yahoo: http://download.wikimedia.org/enwiki/20100730/ ___ MediaWiki-l mailing

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Léa Massiot
A colleague told me about that... so we had a look at it. Unfortunately, abstracts are not correct most of the time... - Example (in French): - titleWikipédia : Arabie

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Fri, Aug 6, 2010 at 10:06 AM, Léa Massiot lea.mass...@ign.fr wrote: A colleague told me about that... so we had a look at it. Unfortunately, abstracts are not correct most of the time... - Example (in French):

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Fri, Aug 6, 2010 at 10:18 AM, Léa Massiot lea.mass...@ign.fr wrote: Are you sure this will be able to extract the introductory paragraph (only) which is not in any section... (because it is not trivial). There is only one example I could find at http://code.pediapress.com/wiki/wiki/mwlib

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Trevor Parscal
The current parser is, as David Gerard said, not much of a parser by any conventional definition. It's more of a macro-expander (for parser tags and templates) and a series of mostly-regular-expression-based replacement routines, which result in partially valid HTML which is then repaired in

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread David Gerard
On 6 August 2010 18:59, Trevor Parscal tpars...@wikimedia.org wrote: In short, the current parser is a bad example of how to write a parser, I forgot to call it a box of pure malevolent evil, a purveyor of insidious insanity, an eldritch manifestation that would make Bill Gates let out a low

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Scheid, Bernhard
@lists.wikimedia.org Betreff: [Mediawiki-l] Wikitext grammar Hi, Thank you or reading my post. I am wondering if there exists a grammar for the Wikicode/Wikitext language (or an *exhaustive* (and formal) set of rules about how is constructed a Wikitext). I've looked for such a grammar/set of rules

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Magnus Manske
On Thu, Aug 5, 2010 at 1:10 PM, lmhelp2 lea.mass...@ign.fr wrote: Hi, Thanks to all of you for your answers. I have decided (in the light of what you told me) to read the Wikitext line after line. I must ignore leading: - templates (including the ones which span  over several

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Katharina Wolkwitz
Hi, there might be an occurrence of __TOC__ or __NOTOC__ before the first real paragraph. Good luck with finding all exeptions. :) Katharina Am 05.08.2010 14:10 schrieb lmhelp2: Hi, Thanks to all of you for your answers. I have decided (in the light of what you told me) to read the

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread lmhelp2
Thank you! So here is the list I have for the moment: I need to ignore lines: - containing: {{...}} = possibly spreading over several lines, = being possibly nested {{... {{ ... }} ... }}. - containing: [[...]] = being possibly nested [[... [[ ... ]] ... ]]. -

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Katharina Wolkwitz
Hi, Am 05.08.2010 16:47 schrieb lmhelp2: Thank you! So here is the list I have for the moment: I need to ignore lines: - containing: {{...}} = possibly spreading over several lines, = being possibly nested {{... {{ ... }} ... }}. - containing: [[...]] =

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread Platonides
lmhelp wrote: Hi, Thank you or reading my post. I am wondering if there exists a grammar for the Wikicode/Wikitext language (or an *exhaustive* (and formal) set of rules about how is constructed a Wikitext). I've looked for such a grammar/set of rules on the Web but I couldn't find

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread David Gerard
On 4 August 2010 20:45, lmhelp lm...@wanadoo.fr wrote: I am wondering if there exists a grammar for the Wikicode/Wikitext language (or an *exhaustive* (and formal) set of rules about how is constructed a Wikitext). I've looked for such a grammar/set of rules on the Web but I couldn't find

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread David Gerard
On 4 August 2010 23:58, David Gerard dger...@gmail.com wrote: On 4 August 2010 20:45, lmhelp lm...@wanadoo.fr wrote: - Is a grammar available somewhere? - Do you have any idea how to extract the first paragaph of a Wiki article? - Any advice? - Does a Java Wikitext parser exists which would