Hi Scott, Thank you very much. This does the job! I'm wondering if this existed and I missed it back in January because I remember that I looked at the book creator back then and there were lesser options (or maybe I simply missed them).
I will probably have to figure out a way to remove the references, external links, and notes sections. Regular expressions could be probably help (other ideas/suggestions are welcome), but Dizzy Logic had this cool thing where they added #Article at the beginning of each article to mark them. That would be a great feature to consider adding to book creator. Best, Reem On 18 November 2016 at 17:17, C. Scott Ananian <[email protected]> wrote: > > OCG contains a "plaintext" backend which generates quite nice plain-text > versions of WP articles. Try clicking "create a book" in the enwiki > sidebar, "start book creator", go to some article, click "add this page to > your book" in the header then "show book", then change the format in the > drop down to "Word processor (plain text)" and click "download". > > You can also take the "download as PDF" link, something like > https://en.wikipedia.org/w/index.php?title=Special:Book& > bookcmd=render_article&arttitle=Jack+Bosden&returnto= > Jack+Bosden&oldid=741271566&writer=rdf2latex > and replace the 'writer=rdf2latex' part at the end with 'writer=rdf2text', > like: > https://en.wikipedia.org/w/index.php?title=Special:Book& > bookcmd=render_article&arttitle=Jack+Bosden&returnto= > Jack+Bosden&oldid=741271566&writer=rdf2text > > These tools can be used from the command-line, as described at > https://github.com/wikimedia/mediawiki-extensions-Collection- > OfflineContentGenerator-text_renderer > > I hope that helps! > --scott > > On Fri, Nov 18, 2016 at 3:15 AM, Reem Al-Kashif <[email protected]> > wrote: > >> Hi Scott, >> >> Thank you so much for your reply and offer to help with Parsoid. I used >> DizzyLogic as an easy parser to get Wikipedia articles' content stripped >> off the wiki markup. The results were in plain text files. I used it to >> parse the whole English and Arabic Wikipedia dumps back in January. It was >> easy to use because my coding knowledge is limited. >> I read the link you kindly provided about Parsoid and I think it can help >> me with parsing. However, I'm not sure how to start on testing this. >> >> Thank you :) >> >> Best, >> Reem >> >> On 11 November 2016 at 19:55, C. Scott Ananian <[email protected]> >> wrote: >> >>> It was removed from that article recently (19 Oct 2016: >>> https://www.mediawiki.org/w/index.php?title=Alternativ >>> e_parsers&type=revision&diff=2265815&oldid=2247632) with the following >>> comment: >>> >>> "That link has been dead for over a year now as per this stackoverflow >>> comment: http://stackoverflow.com/questions/13546254/whats-a-fast-way >>> -to-parse-a-wikipedia-xml-dump-for-article-content-and-populate" >>> >>> If you'd like to explain what you would have used DizzyLogic for, I'd >>> love to help you figure out how to use Parsoid to accomplish your goals. >>> It's an officially-supported WMF parser which has much better correctness >>> that any 'alternative' parser out there, implements a friendly API similar >>> to mwparserfromhell (see https://doc.wikimedia.org >>> /Parsoid/master/#!/guide/jsapi), and has a well-documented AST ( >>> https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly >>> fetched via the REST api (cf https://en.wikipedia.org/api/ ). I >>> believe dumps have also been planned, but I'm not sure what the current >>> status is. >>> --scott >>> >>> >>> On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <[email protected]> >>> wrote: >>> >>>> Hi Pine, >>>> >>>> Thank you for your reply. It is an alternative parser. I believe I >>>> first saw on MediaWiki (here >>>> <http://t.sidekickopen68.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs7gbG1nW4WYnHT8q-c7CVRbxS056dC2Qf1b_0xC02?t=https%3A%2F%2Fwww.mediawiki.org%2Fwiki%2FAlternative_parsers&si=5334612837924864&pi=be9d881d-b222-408c-e571-5331aacb58c8> >>>> ). >>>> >>>> Best, >>>> Reem >>>> >>>> On 11 November 2016 at 09:47, Pine W <[email protected]> wrote: >>>> >>>>> Was this something on Labs? If so, it might have been purged during >>>>> one of the Labs cleanups. >>>>> >>>>> Pine >>>>> >>>>> >>>>> On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm just wondering if anybody knows what happened to DizzyLogic wiki >>>>>> parser? The website and program vanished. I used it in January 2016 so I >>>>>> know it was there at this time. >>>>>> >>>>>> Best, >>>>>> Reem >>>>>> >>>>>> -- >>>>>> >>>>>> *Kind regards,Reem Al-Kashif* >>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Kind regards,Reem Al-Kashif* >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> >>> -- >>> (http://cscott.net) >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> >> *Kind regards,Reem Al-Kashif* >> > > > > -- > (http://cscott.net) > -- *Kind regards,Reem Al-Kashif*
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
