Hi Markus et al. Thank you for the answer. I have a few follow-up questions as I'm not quite grasping the toolkit.
Alternative 1: So, if I'd like to do 1) I need a dump file, I've downloaded a *-current dump ( http://dumps.wikimedia.org/wikidatawiki/20150330/wikidatawiki-20150330-pages-meta-current.xml.bz2) and am trying to process it using the DumpProcessingController class - which I'm assuming is the wrong way to go about this. Is there a guide on how to parse local dumps? Alternative 2: I've been looking at the FetchOnlineDataExample and this seems to do pretty much what I need, except for retrieving interlanguage links for a page given the entity title - not the id. Is this possible, or is there a possibility of getting the entity id given a page title in a given language? Thanks Alan -- Alan Said Recorded Future e: alans...@acm.org t: @alansaid w: www.alansaid.com On Fri, Apr 17, 2015 at 5:17 PM, Markus Krötzsch < mar...@semantic-mediawiki.org> wrote: > Hi Alan, > > The SitelinksExample shows how to get the basic language-links data. In > Wikidata, sites are encoded by IDs such as "enwiki" or "frwikivoyage". To > find out what they mean in terms of URLs, you need to get the interlanguage > information first. The example shows you how to do this. > > The site link information for a particular item can be found in the > ItemDocument for that item. There are two ways of getting an ItemDocument: > > (1) You process the dump file to process all items one by one (in the > order in which they appear in the dump). This is best if you want to look > at very many items, or if you must work completely in offline mode. > (2) You fetch individual items from the Web API individually (random > access). This is best if you only need the links for a few selected items > only (fetching hundreds from the API is quick, fetching millions is > infeasible). > > You can find many examples for doing things along the lines of (1) with > WDTK. For (2), see the example FetchOnlineDataExample (this is only part of > the development version of v0.5.0 so far, which you can find on github). > > In either case, you can direclty read out any sitelink from the > ItemDocument object. It will give you the article title, the site id > ("enwiki" etc.), and the list of badges (if any). To turn this into a URL, > you would use code as in the SitelinksExample. > > Cheers, > > Markus > > > > On 17.04.2015 15:18, Alan Said wrote: > >> Hi all, >> I am trying to use the Wikidata Toolkit to extract interlanguage links >> for certain pages from Wikipedia. >> >> So far, I've tried different attempts based on the code provided in >> SiteLinksExample >> ( >> https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/SitelinksExample.java >> ) >> without any success. I've realized that this is likely not the correct >> approach. >> >> Optimally I'd like to do this while processing a local file, I've >> downloaded a pages-meta-current.xml.bz2 file, but I can't really get my >> head around how to go ahead with this. >> Any pointers are appreciated. >> >> Best, >> Alan >> >> -- >> Alan Said >> Recorded Future >> e: alans...@acm.org <mailto:alans...@acm.org> >> t: @alansaid >> w: www.alansaid.com <http://www.alansaid.com> >> >> >> _______________________________________________ >> Wikidata-l mailing list >> Wikidata-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >> > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l >
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l