Re: [Dbpedia-discussion] pagelinks not loaded @ dbpedia.org?

Kingsley Idehen Tue, 14 Jul 2009 14:08:14 -0700

Peter Ansell wrote:
> 2009/7/13 Georgi Kobilarov <[email protected]>:
>   
>> Hi John,
>>
>> it is true that the pagelinks dataset has not been loaded into the
>> dbpedia.org/sparql instance. We did load it once, but most Linked Data
>> and Sparql clients struggled with the very large result sets. For
>> example, DESCRIBE http://dbpedia.org/resource/United_States would result
>> in more than 300k triples due to the pagelinks (in particular because of
>> inbound links)
>>
>> So we decided to only provide that dataset for download, but don't load
>> it. I think it would be possible to provide a different rdf instance
>> containing the pagelinks as well, but for the moment you'd have to load
>> that dataset into your own repository.
>>     
>
> In any very highly linked dataset you will find that DESCRIBE
> behaviour implemented to retrieve both subject and object triples will
> pretty much fail because of this issue. It is a shame that the inter
> article links in Wikipedia are so diverse that they aren't accessible
> using Linked Data anymore... ;-) (Seriously though, it is actually a
> shame that there is a barrier to accessing this information without
> downloading the dataset yourself, and even then you have to explain to
> others that they need to do it themselves also...)
>
> If only there was a way to tell the DESCRIBE query to get subject
> triples and any referenced blank nodes but ignore triples which would
> only be included because the URI was the object, but sadly DESCRIBE
> queries are specified to be vendor implemented so anything could
> happen in the future with ones DESCRIBE query I guess? Do the triples
> where the URI is the subject get retrieved before the more trivial
> triples where the URI is the object?
>
> Depending on the limits that are set you might also be unable to
> actually retrieve all of the triples in cases like this even if you
> did set up another endpoint just for pagelinks. The public dbpedia
> endpoint is setup from memory so a client can't even use LIMIT and
> OFFSET to get 300 thousand triples?
>
> Cheers,
>
> Peter
>
> ------------------------------------------------------------------------------
> Enter the BlackBerry Developer Challenge  
> This is your chance to win up to $100,000 in prizes! For a limited time, 
> vendors submitting new applications to BlackBerry App World(TM) will have
> the opportunity to enter the BlackBerry Developer Challenge. See full prize  
> details at: http://p.sf.net/sfu/Challenge
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>   
Peter,


Solution is to put the data set in a separate graph, and then people can 
explicity access and mesh with other data sources.

Once loaded into the DBpedia.org instance, using the /fct interface will 
expose data from the <http:/dbpedia.org> group (which itself is made of 
graph IRIs per loaded data set, now) and from the new pagelinks graph 
which will have IRI: <http://dbpedia.org/pagelinks#> .



-- 


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com





------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] pagelinks not loaded @ dbpedia.org?

Reply via email to