Allo Allo!
I ve been checking on dbpedia instead of working on pure wikipedia dump,
and got questions about consistency of ontologies against the dump of
original wikipedia.
See example:
1.
grep 'Terminator_2' mappingbased_properties_en.ttl
won't report the starring actors.
instead, see:
http://en.wikipedia.org/wiki/Terminator_2:_Judgment_Day
2.
page_links_en.ttl seems to report truncations:
http://en.wikipedia.org/wiki/Islamic_State_of_Iraq_and_the_Levant
is missing in:
Page_ids_en.ttl
while present in page_links_en.ttl and redirects_en.ttl
I found 'Islamic_State_of_Iraq_and_the_Levant
<http://en.wikipedia.org/wiki/Islamic_State_of_Iraq_and_the_Levant>' is
reported as: 'Islamic_State_of_Iraq
<http://en.wikipedia.org/wiki/Islamic_State_of_Iraq_and_the_Levant>' in
page_ids_en.ttl, but this leads to inconsistencies with the other files.
Is it a case of truncation or other errors?
3. Is there any way to synchronize live dbpedia with wikipedia (dbpedia
-live-mirror), without installing the whole virtuoso db.
just to extact data (via python ? ) one may need from the original dump
(e.g. get the same .ttl files)
4. Eventually, does dbpedia use a lxml parser for wiki dump (whole dump)?
Would is be open source to adjust it ?
I tried to use it, but i fail since there are symbols like <C8> in the
middle of the <page> tag, not closed : they are not correct xml tags
(indeed they should be symbols for some char i think), and so the iteration
fail with not clean results.
Thank you so much, i hope my little notes could be of help too.
Luigi
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc