On 1 September 2013 12:26, Omri Oren <[email protected]> wrote:
> Ok, so how can I get the functionality that I used to get from the
> interlanguage files? Is it not extractable anymore using DBpedia? In that
> case, should I download Wikidata dumps now and parse them instead?
>
Well, this is an area of Wikipedia and DBpedia that's currently going
through a lot of changes.
As far as I know, Wikidata does not yet publish official dumps in a
reliable format. They do publish dumps, but the data is in an internal JSON
format that may change any time without notice. (I don't blame them -
they're doing a great job with limited resources. The dumps simply aren't
high on their priority list.) They're also very helpful - when we asked
them how we could best extract the interlanguage links now, they
(specifically, Daniel Kinzler) prepared a dump of the appropriate Wikidata
tables. It was a one-off dump though, which was fine for us, but maybe
won't help you much. It's available at [1], and the script that generates
RDF from it is at [2]. You could ask the Wikidata people if they could
generate that dump on a regular basis.
Hady is writing code to extract interlanguage links from a preliminary
Wikidata RDF dump provided by Markus Krötzsch. Maybe you can use Hady's
code. But as I said, this stuff is changing all the time. Some time soon,
Wikidata will probably publish 'official' RDF dumps, but that may happen in
two weeks, two months or two years.
[1] https://toolserver.org/~daniel/misc/sitelinks-2013-06-18.csv.bz2
[2]
https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessWikidataLinks.scala
> And is this change documented anywhere? (any wiki that deals with changes
> to the content of DBpedia dumps that I extract?)
>
This change is due to changes at Wikipedia. We were aware of them because
we're following what's happening at Wikipedia, for example by subscribing
to their mailing lists etc. I think we also discussed these issues on the
DBpedia mailing lists.
> I'm asking this because I have automatic scripts that assume they get a
> certain input from DBpedia, and now I suddenly found out that most of the
> data is not there anymore for the last few months - I'd like to avoid such
> surprises in the future...
>
I understand. I'm afraid all I can suggest is to follow the DBpedia mailing
lists. We do write a Changelog when we make a new DBpedia release, but
that's too late for you. I don't think we will start writing a wiki or blog
or so to keep users up-to-date about changes in DBpedia code or Wikipedia
input data, we simply don't have enough time for that. I'm sorry.
>
> 10x,
> Omri
>
> *Omri Oren* Algorithm Engineer <[email protected]>[email protected]
> <http://corp.everything.me/>
> visit us at http://everything.me <http://corp.everything.me/>
> <https://play.google.com/store/apps/details?id=me.everything.launcher&referrer=utm_source%3Devme%26utm_medium%3Demailsig>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Sep 1, 2013 at 1:02 PM, Jona Christopher Sahnwaldt <
> [email protected]> wrote:
>
>> Hi Omri,
>>
>> almost all interlanguage links have been moved from the Wikipedias to
>> Wikidata. That's why these files have become so much smaller. For the 3.9
>> release, we extracted the links from a special Wikidata dump that Daniel
>> Kinzler prepared for us. In the future, we will generate them from Wikidata
>> RDF dumps. The interlanguage links left on Wikipedia are not useful anymore.
>>
>> HTH,
>> JC
>>
>>
>>
>> On 1 September 2013 10:47, Omri Oren <[email protected]> wrote:
>>
>>> Hi again,
>>>
>>> I think there's some problem extracting the interlanguage files.
>>> The files I created in January were about 70x larger than the ones I'm
>>> getting in the last 2 months, even though I added a few languages to the
>>> config file since January (or maybe that's the reason?).
>>> Most of the wikipages are now missing from the interlanguage files.
>>>
>>> Any idea why?
>>>
>>> Do your interlanguage files have the correct size and contain everything
>>> they're supposed to? (e.g. "Der Spiegel" only exists in my January version
>>> and not in the July or August versions)
>>>
>>> Thanks,
>>> Omri
>>>
>>>
>>> -rw-rw-r-- 1 user user *2365922690 Jan 23* 2013
>>> enwiki-20130102-interlanguage-links.ttl
>>> -rw-rw-r-- 1 user user 5145632 Jan 23 2013
>>> enwiki-20130102-interlanguage-links-see-also.ttl
>>> -rw-rw-r-- 1 user user 121030569 Jan 23 2013
>>> enwiki-20130102-interlanguage-links-same-as.ttl
>>>
>>> -rw-rw-r-- 1 user user *40889809 Jul 16* 10:13
>>> enwiki-20130708-interlanguage-links.ttl
>>> -rw-rw-r-- 1 user user 8056132 Jul 16 11:29
>>> enwiki-20130708-interlanguage-links-see-also.ttl
>>> -rw-rw-r-- 1 user user 2351624 Jul 16 11:29
>>> enwiki-20130708-interlanguage-links-same-as.ttl
>>>
>>> -rw-rw-r-- 1 user user *34848530 Aug 29 *14:25
>>> enwiki-20130805-interlanguage-links.ttl
>>> -rw-rw-r-- 1 user user 6921792 Aug 29 16:34
>>> enwiki-20130805-interlanguage-links-see-also.ttl
>>> -rw-rw-r-- 1 user user 1486163 Aug 29 16:34
>>> enwiki-20130805-interlanguage-links-same-as.ttl
>>>
>>>
>>>
>>> *Omri Oren* Algorithm Engineer <[email protected]>[email protected]
>>> <http://corp.everything.me/>
>>> visit us at http://everything.me <http://corp.everything.me/>
>>> <https://play.google.com/store/apps/details?id=me.everything.launcher&referrer=utm_source%3Devme%26utm_medium%3Demailsig>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
>>> Discover the easy way to master current and previous Microsoft
>>> technologies
>>> and advance your career. Get an incredible 1,500+ hours of step-by-step
>>> tutorial videos with LearnDevNow. Subscribe today and save!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers