Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-10-04 Thread Kingsley Idehen

On 10/4/13 10:22 AM, Leigh Dodds wrote:

Hi Hugh,

Hasn't dbpedia always suffered from this? I've tended to do the same
as you and have encountered similar inconsistencies. I've never really
figured out whether its down to inconsistency encoding in the data
conversion or something else.

Cheers,

L.


The dumps need owl:sameAs or redirects relation based triples. A course 
of action could be to place these triples in a special dataset which 
then makes loading optional.


I've also added the DBpedia list to the thread.

Kingsley



On Fri, Oct 4, 2013 at 1:42 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:

Hi.
Chris has suggested I send the following to the LOD list, as it may be of 
interest to several people:

Hi Chris.
Great stuff!

I have a question.
Or would you prefer I put it on the LOD list for discussion?

It is about url encoding.

Dbpedia:
http://dbpedia.org/page/Ashford_%28borough%29 is not found
http://dbpedia.org/page/Ashford_(borough) works, and redirects to
 http://dbpedia.org/resource/Borough_of_Ashford
Wikipedia:
http://en.wikipedia.org/wiki/Ashford_%28borough%29 works
http://en.wikipedia.org/wiki/Ashford_(borough) works
Both go to the page with content of 
http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the address 
bar doesn't change.

So the problem:
I usually find things in wikipedia, and then use the last bit to construct the 
dbpedia URI - I suspect lots of people do this.
But as you can see, the url encoded URI, which can often be found in the wild, 
won't allow me to do this.
There are of course many wikipedia URLs with ( and ) in them - (artist), 
(programmer), (borough) etc.
It is also the same with comma and single quote.

I think this may be different from 3.8, but can't be sure - is it intended?

Very best
Hugh






--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen







smime.p7s
Description: S/MIME Cryptographic Signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-24 Thread Kingsley Idehen

On 9/24/13 4:58 AM, Diego Valerio Camarda wrote:

really, really a good work!



Thanks!

the SPARQL endpoint is a lot faster! it is because you started using 
virtuoso 7?


Yes.



one more question: I have 300 high quality sameAs of italian 
politicians, the links connect http://dati.camera.it to 
http://dbpedia.org, what is the better way to submit them hoping that 
they will be included in the next dump?


You can simply publish the dump via places like CKAN or some other LOD 
data space on the Web. Once published, you can simply notify us and we 
will load it into its own named graph while the rest DBpedia team work 
on integrating it into the next major release.


Kingsley




On Mon, Sep 23, 2013 at 7:37 PM, Kingsley Idehen 
kide...@openlinksw.com mailto:kide...@openlinksw.com wrote:


On 9/23/13 1:00 PM, Tom Morris wrote:

Congratulations on the new release!

On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de
mailto:ch...@bizer.de wrote:


1. the new release is based on updated Wikipedia dumps dating
from March /
April 2013 (the 3.8 release was based on dumps from June
2012), leading to
an overall increase in the number of concepts in the English
edition from
3.7 to 4.0 million things.


What accounts for the long latency between the date of the dumps
and the date of the release?

Tom


A number of things:

1. Dataset QA -- the datasets are generated from mapping efforts
2. Dataset Loading  QA
-- Linked Data Deployment (i.e., new URIs resolve to the new data)
-- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .


Kingsley






--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack 
includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk


___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net  
mailto:Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



-- 


Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web:http://www.openlinksw.com
Personal Weblog:http://www.openlinksw.com/blog/~kidehen  
http://www.openlinksw.com/blog/%7Ekidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile:https://plus.google.com/112399767740508618350/about
LinkedIn Profile:http://www.linkedin.com/in/kidehen






--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get
the most from
the latest Intel processors and coprocessors. See abstracts and
register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
mailto:Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion





--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen






smime.p7s
Description: S/MIME Cryptographic Signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-23 Thread Kingsley Idehen

On 9/23/13 1:00 PM, Tom Morris wrote:

Congratulations on the new release!

On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de 
mailto:ch...@bizer.de wrote:



1. the new release is based on updated Wikipedia dumps dating from
March /
April 2013 (the 3.8 release was based on dumps from June 2012),
leading to
an overall increase in the number of concepts in the English
edition from
3.7 to 4.0 million things.


What accounts for the long latency between the date of the dumps and 
the date of the release?


Tom


A number of things:

1. Dataset QA -- the datasets are generated from mapping efforts
2. Dataset Loading  QA
-- Linked Data Deployment (i.e., new URIs resolve to the new data)
-- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .


Kingsley




--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk


___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen






smime.p7s
Description: S/MIME Cryptographic Signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-23 Thread Tom Morris
Congratulations on the new release!

On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de wrote:


 1. the new release is based on updated Wikipedia dumps dating from March /
 April 2013 (the 3.8 release was based on dumps from June 2012), leading to
 an overall increase in the number of concepts in the English edition from
 3.7 to 4.0 million things.


What accounts for the long latency between the date of the dumps and the
date of the release?

Tom
--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-23 Thread Paul A. Houle
One of the goals of the infovore project is to develop something that 
targets this latency problem.


https://github.com/paulhoule/infovore/wiki

I’ve talked with a number of organizations that use DBpedia and Freebase 
data and almost all of them have either no solution or an incomplete solution 
for dealing with changes over time,  something that’s absolutely necessary for 
sustainable social-semantic systems.  Many of them have considered developing 
it but decided against developing it in house.

   When Freebase changed the format of the RDF dump I was able to adapt in less 
than a week (most of the time delay was that no official dump came out that 
week and I didn’t know what was going on);  after fixing my code I was able to 
run against it interactively.

   Infovore is not using Hadoop so much for “big data”,  but rather for “low 
latency”.  Not extremely low latency,  but once I trust the system enough it 
ought to have Freebase processed before I wake up on Sunday.  The files are 
smaller than the official dump and will load faster,  both things that will 
lower latency for the consumer.

   Right now the process is limited by the not-so-parallel process of 
ungzipping and re-gzipping the Freebase dump,  but I believe a processing 
pipeline much more complex than the current one could still be run in less than 
a hour if you throw enough AWS instances at it

   The framework ought to work for any RDF data,  including DBpedia (for which 
it has been tested),  and I have a lot of stuff planned,  including something 
that could “smush” Dbpedia identifiers to Freebase identifiers or the other way 
around to create a merged data set.

   Yes,  what I am doing today is much simpler than what DBpedia is doing,  but 
I’m taking a multi-pronged approach that focuses on process as much as 
technology.  I’m keeping a notebook of how much time it takes me to do 
everything and learning how to squeeze out the errors and waste time with a 
battery of methods that are being documented.  It is possible to run clusters 
in Amazon EMR by simply providing a credential pair – you don’t need to know 
much at all about AWS or Hadoop.

I invite all of you to follow the this project and github and also follow 
the Google Group

https://groups.google.com/forum/#!forum/infovore-basekb

where you’ll get roughly two status reports a week and where people with 
questions get quick answers.

 I can definitely use contributions too,  because the list of things I’d 
like to see are long and my own work will be focused on my own needs.  Even if 
you don’t contribute,  I welcome feature requests on the issue tracker.



From: Kingsley Idehen 
Sent: Monday, September 23, 2013 1:37 PM
To: dbpedia-discussion@lists.sourceforge.net 
Subject: Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider 
infobox coverage, additional type statements, and new YAGO and Wikidata links

On 9/23/13 1:00 PM, Tom Morris wrote:

  Congratulations on the new release! 

  On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de wrote:


1. the new release is based on updated Wikipedia dumps dating from March /
April 2013 (the 3.8 release was based on dumps from June 2012), leading to
an overall increase in the number of concepts in the English edition from
3.7 to 4.0 million things.


  What accounts for the long latency between the date of the dumps and the date 
of the release?

  Tom

A number of things:

1. Dataset QA -- the datasets are generated from mapping efforts 
2. Dataset Loading  QA 
-- Linked Data Deployment (i.e., new URIs resolve to the new data)
-- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .


Kingsley



   

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
   

___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



-- 

Regards,

Kingsley Idehen   
Founder  CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen







--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips

Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-23 Thread Kingsley Idehen

On 9/23/13 3:48 PM, Paul A. Houle wrote:
One of the goals of the infovore project is to develop something 
that targets this latency problem.

https://github.com/paulhoule/infovore/wiki
I’ve talked with a number of organizations that use DBpedia and 
Freebase data and almost all of them have either no solution or an 
incomplete solution for dealing with changes over time,  something 
that’s absolutely necessary for sustainable social-semantic systems.  
Many of them have considered developing it but decided against 
developing it in house.


I bet they have :-)

   When Freebase changed the format of the RDF dump I was able to 
adapt in less than a week (most of the time delay was that no official 
dump came out that week and I didn’t know what was going on);  after 
fixing my code I was able to run against it interactively.
   Infovore is not using Hadoop so much for “big data”, but rather for 
“low latency”.  Not extremely low latency, but once I trust the system 
enough it ought to have Freebase processed before I wake up on 
Sunday.  The files are smaller than the official dump and will load 
faster,  both things that will lower latency for the consumer.
   Right now the process is limited by the not-so-parallel process of 
ungzipping and re-gzipping the Freebase dump,  but I believe a 
processing pipeline much more complex than the current one could still 
be run in less than a hour if you throw enough AWS instances at it
   The framework ought to work for any RDF data, including DBpedia 
(for which it has been tested),  and I have a lot of stuff planned,  
including something that could “smush” Dbpedia identifiers to Freebase 
identifiers or the other way around to create a merged data set.


Nice!

   Yes,  what I am doing today is much simpler than what DBpedia is 
doing,  but I’m taking a multi-pronged approach that focuses on 
process as much as technology.  I’m keeping a notebook of how much 
time it takes me to do everything and learning how to squeeze out the 
errors and waste time with a battery of methods that are being 
documented.


Yes, that's the way to approach this matter. First pass, manual so you 
can get a good handle on the real time costs.


It is possible to run clusters in Amazon EMR by simply providing a 
credential pair – you don’t need to know much at all about AWS or Hadoop.
I invite all of you to follow the this project and github and also 
follow the Google Group
https://groups.google.com/forum/#!forum/infovore-basekb 
https://groups.google.com/forum/#%21forum/infovore-basekb


I am following it.

where you’ll get roughly two status reports a week and where 
people with questions get quick answers.
 I can definitely use contributions too,  because the list of 
things I’d like to see are long and my own work will be focused on my 
own needs.  Even if you don’t contribute, I welcome feature requests 
on the issue tracker.


This should be interesting to fellow DBpedia and LOD folk, for sure.

Kingsley

*From:* Kingsley Idehen mailto:kide...@openlinksw.com
*Sent:* Monday, September 23, 2013 1:37 PM
*To:* dbpedia-discussion@lists.sourceforge.net 
mailto:dbpedia-discussion@lists.sourceforge.net
*Subject:* Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, 
including wider infobox coverage, additional type statements, and new 
YAGO and Wikidata links

On 9/23/13 1:00 PM, Tom Morris wrote:

Congratulations on the new release!
On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de 
mailto:ch...@bizer.de wrote:



1. the new release is based on updated Wikipedia dumps dating
from March /
April 2013 (the 3.8 release was based on dumps from June 2012),
leading to
an overall increase in the number of concepts in the English
edition from
3.7 to 4.0 million things.

What accounts for the long latency between the date of the dumps and 
the date of the release?

Tom


A number of things:

1. Dataset QA -- the datasets are generated from mapping efforts
2. Dataset Loading  QA
-- Linked Data Deployment (i.e., new URIs resolve to the new data)
-- SPARQL Endpoint (new data is accessible via SPARQL endpoint) .


Kingsley



--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk


___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



--

Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web:http://www.openlinksw.com
Personal Weblog:http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle