Hi DBpedia people, I've been using the latest (3.7) Portuguese dumps to build a custom dbpedia index for Stanbol's embedded SOLR (http://incubator.apache.org/stanbol/).
I really don't know if this is the right place to say this, basically I've been messing with the generated index and I've found a number of cases where the abstract/comment in PT is all messed up, for example: http://dbpedia.org/page/Livonian_Brothers_of_the_Sword http://dbpedia.org/page/Kingdom_of_Poland_%281916%E2%80%931918%29 So tracking these back to Wikipedia, I can see that in these cases there is no Infobox/wiki markup, instead there is LOTS of HTML that results in something similar being rendered. However when extracted this makes the comments/abstracts full of HTML markup :( So I hope this helps identify weak spots in Portuguese dbpedia/wikipedia or at least can be redirected by someone to someone that cares (Pablo Mendes?). At the moment I don't know if there are more cases like this or how many of the articles are like this, I'll go on testing and report back. Best, Alex ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
