If you are to extract only Wikipedia'a articles first paragraph no problema.
2010/8/6 Katharina Wolkwitz <[email protected]> > Hi, > > Am 05.08.2010 16:47 schrieb lmhelp2: > > > > Thank you! > > > > So here is the list I have for the moment: > > I need to ignore lines: > > - containing: {{...}} > > => possibly spreading over several lines, > > => being possibly nested {{... {{ ... }} ... }}. > > - containing: [[...]] > > => being possibly nested [[... [[ ... ]] ... ]]. > > - equal to: __TOC__ > > - equal to: __NOTOC__ > > - beginning with the '=' character > > - beginning with the '*' character > I don't think you should ignore lines beginning with the '*' character - > those > may include the wanted first paragraph of the text as the '*' is just a way > of > formatting the page... > > Greetings > Katharina > > _______________________________________________ > MediaWiki-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > -- {+}Nevinho Venha para o Movimento Colaborativo http://sextapoetica.com.br !! _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
