Could you give us an example? Which file, which page title?
On Apr 3, 2013 7:56 AM, "Ning Zhang" <[email protected]> wrote:

> Hi guys,
>
> I am doing some work on analyzing wiki dumps. However, I confront with a
> headache problem that some text (<text> under <revision> ) seems to be
> malicious. It may only contains one dirty word and repeat again and again.
> What makes it worse is that some of such strings seem to be endless, which
> leads my parser to get stuck when reading it. I extracted such text to read
> under vim and vim shows that it has an exact number of lines. But when I
> click page down, it just cannot reach the end and get stuck into endless
> messy code.
>
> Have you ever confronted with such problem? Thanks a lot.
>
> Best regards,
>
> --
> Ning Zhang
> Purdue University
> E-mail:[email protected]
> Cell Phone:765-337-6629
>
>
>
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to