Re: [Wikidata-l] Interlanguage links using Wikidata Toolkit

Alan Said Tue, 21 Apr 2015 08:33:07 -0700

Hi Markus et al.
Thank you for the answer. I have a few follow-up questions as I'm not quite
grasping the toolkit.


Alternative 1:
So, if I'd like to do 1) I need a dump file, I've downloaded a *-current
dump (
http://dumps.wikimedia.org/wikidatawiki/20150330/wikidatawiki-20150330-pages-meta-current.xml.bz2)
and am trying to process it using the DumpProcessingController class -
which I'm assuming is the wrong way to go about this.
Is there a guide on how to parse local dumps?

Alternative 2:
I've been looking at the FetchOnlineDataExample and this seems to do pretty
much what I need, except for retrieving interlanguage links for a page
given the entity title - not the id. Is this possible, or is there a
possibility of getting the entity id given a page title in a given language?

Thanks

Alan

-- 
Alan Said
Recorded Future
e: alans...@acm.org
t: @alansaid
w: www.alansaid.com

On Fri, Apr 17, 2015 at 5:17 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Hi Alan,
>
> The SitelinksExample shows how to get the basic language-links data. In
> Wikidata, sites are encoded by IDs such as "enwiki" or "frwikivoyage". To
> find out what they mean in terms of URLs, you need to get the interlanguage
> information first. The example shows you how to do this.
>
> The site link information for a particular item can be found in the
> ItemDocument for that item. There are two ways of getting an ItemDocument:
>
> (1) You process the dump file to process all items one by one (in the
> order in which they appear in the dump). This is best if you want to look
> at very many items, or if you must work completely in offline mode.
> (2) You fetch individual items from the Web API individually (random
> access). This is best if you only need the links for a few selected items
> only (fetching hundreds from the API is quick, fetching millions is
> infeasible).
>
> You can find many examples for doing things along the lines of (1) with
> WDTK. For (2), see the example FetchOnlineDataExample (this is only part of
> the development version of v0.5.0 so far, which you can find on github).
>
> In either case, you can direclty read out any sitelink from the
> ItemDocument object. It will give you the article title, the site id
> ("enwiki" etc.), and the list of badges (if any). To turn this into a URL,
> you would use code as in the SitelinksExample.
>
> Cheers,
>
> Markus
>
>
>
> On 17.04.2015 15:18, Alan Said wrote:
>
>> Hi all,
>> I am trying to use the Wikidata Toolkit to extract interlanguage links
>> for certain pages from Wikipedia.
>>
>> So far, I've tried different attempts based on the code provided in
>> SiteLinksExample
>> (
>> https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/SitelinksExample.java
>> )
>> without any success. I've realized that this is likely not the correct
>> approach.
>>
>> Optimally I'd like to do this while processing a local file, I've
>> downloaded a pages-meta-current.xml.bz2 file, but I can't really get my
>> head around how to go ahead with this.
>> Any pointers are appreciated.
>>
>> Best,
>> Alan
>>
>> --
>> Alan Said
>> Recorded Future
>> e: alans...@acm.org <mailto:alans...@acm.org>
>> t: @alansaid
>> w: www.alansaid.com <http://www.alansaid.com>
>>
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Interlanguage links using Wikidata Toolkit

Reply via email to