Hi Scott,

Thank you very much. This does the job! I'm wondering if this existed and I
missed it back in January because I remember that I looked at the book
creator back then and there were lesser options (or maybe I simply missed
them).

I will probably have to figure out a way to remove the references, external
links, and notes sections. Regular expressions could be probably help
(other ideas/suggestions are welcome), but Dizzy Logic had this cool thing
where they added #Article at the beginning of each article to mark them.
That would be a great feature to consider adding to book creator.

Best,
Reem

On 18 November 2016 at 17:17, C. Scott Ananian <[email protected]>
wrote:

>
> OCG contains a "plaintext" backend which generates quite nice plain-text
> versions of WP articles.  Try clicking "create a book" in the enwiki
> sidebar, "start book creator", go to some article, click "add this page to
> your book" in the header then "show book", then change the format in the
> drop down to "Word processor (plain text)" and click "download".
>
> You can also take the "download as PDF" link, something like
> https://en.wikipedia.org/w/index.php?title=Special:Book&;
> bookcmd=render_article&arttitle=Jack+Bosden&returnto=
> Jack+Bosden&oldid=741271566&writer=rdf2latex
> and replace the 'writer=rdf2latex' part at the end with 'writer=rdf2text',
> like:
> https://en.wikipedia.org/w/index.php?title=Special:Book&;
> bookcmd=render_article&arttitle=Jack+Bosden&returnto=
> Jack+Bosden&oldid=741271566&writer=rdf2text
>
> These tools can be used from the command-line, as described at
> https://github.com/wikimedia/mediawiki-extensions-Collection-
> OfflineContentGenerator-text_renderer
>
> I hope that helps!
>   --scott
>
> On Fri, Nov 18, 2016 at 3:15 AM, Reem Al-Kashif <[email protected]>
> wrote:
>
>> Hi Scott,
>>
>> Thank you so much for your reply and offer to help with Parsoid. I used
>> DizzyLogic as an easy parser to get Wikipedia articles' content stripped
>> off the wiki markup. The results were in plain text files. I used it to
>> parse the whole English and Arabic Wikipedia dumps back in January. It was
>> easy to use because my coding knowledge is limited.
>> I read the link you kindly provided about Parsoid and I think it can help
>> me with parsing. However, I'm not sure how to start on testing this.
>>
>> Thank you :)
>>
>> Best,
>> Reem
>>
>> On 11 November 2016 at 19:55, C. Scott Ananian <[email protected]>
>> wrote:
>>
>>> It was removed from that article recently (19 Oct 2016:
>>> https://www.mediawiki.org/w/index.php?title=Alternativ
>>> e_parsers&type=revision&diff=2265815&oldid=2247632) with the following
>>> comment:
>>>
>>> "That link has been dead for over a year now as per this stackoverflow
>>> comment: http://stackoverflow.com/questions/13546254/whats-a-fast-way
>>> -to-parse-a-wikipedia-xml-dump-for-article-content-and-populate"
>>>
>>> If you'd like to explain what you would have used DizzyLogic for, I'd
>>> love to help you figure out how to use Parsoid to accomplish your goals.
>>> It's an officially-supported WMF parser which has much better correctness
>>> that any 'alternative' parser out there, implements a friendly API similar
>>> to mwparserfromhell (see https://doc.wikimedia.org
>>> /Parsoid/master/#!/guide/jsapi), and has a well-documented AST (
>>> https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly
>>> fetched via the REST api (cf https://en.wikipedia.org/api/ ).  I
>>> believe dumps have also been planned, but I'm not sure what the current
>>> status is.
>>>  --scott
>>>
>>>
>>> On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <[email protected]>
>>> wrote:
>>>
>>>> Hi Pine,
>>>>
>>>> Thank you for your reply. It is an alternative parser. I believe I
>>>> first saw on MediaWiki (here
>>>> <http://t.sidekickopen68.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs7gbG1nW4WYnHT8q-c7CVRbxS056dC2Qf1b_0xC02?t=https%3A%2F%2Fwww.mediawiki.org%2Fwiki%2FAlternative_parsers&si=5334612837924864&pi=be9d881d-b222-408c-e571-5331aacb58c8>
>>>> ).
>>>>
>>>> Best,
>>>> Reem
>>>>
>>>> On 11 November 2016 at 09:47, Pine W <[email protected]> wrote:
>>>>
>>>>> Was this something on Labs? If so, it might have been purged during
>>>>> one of the Labs cleanups.
>>>>>
>>>>> Pine
>>>>>
>>>>>
>>>>> On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm just wondering if anybody knows what happened to DizzyLogic wiki
>>>>>> parser? The website and program vanished. I used it in January 2016 so I
>>>>>> know it was there at this time.
>>>>>>
>>>>>> Best,
>>>>>> Reem
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Kind regards,Reem Al-Kashif*
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Kind regards,Reem Al-Kashif*
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>> (http://cscott.net)
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>>
>> *Kind regards,Reem Al-Kashif*
>>
>
>
>
> --
> (http://cscott.net)
>



-- 

*Kind regards,Reem Al-Kashif*
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to