subject:"\[Dbpedia\-discussion\] Freebase, Wikidata and the future of DBpedia"

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-29 Thread Markus Kroetzsch

Well said, Kingsley :-)

As a lurker on this list, I found this an interesting discussion. I can 
understand Martin's sentiment, and the desire to do something to 
ensure that DBpedia will continue to be successful in the future. 
However, in my experience, such strategic discussions rarely lead to 
much. My advice is to simply make sure that DBpedia is the best it can 
be, without worrying too much about competitors. Your users do not 
expect you to become like Wikidata or Freebase (or to import their data) 
-- they want things that only DBpedia can provide and which it is best 
at providing.

Some comments regarding Wikidata RDF and Linked Data exports:

* Wikidata's RDF dumps use a kind of reification, but not the deprecated 
RDF reification vocabulary. The details are explained in our ISWC'14 
paper https://ddll.inf.tu-dresden.de/web/Inproceedings4005/en

* Wikidata serves linked data via content negotiation in its IRIs, e.g., 
http://www.wikidata.org/entity/Q465 (the RDF data you get there is 
http://www.wikidata.org/wiki/Special:EntityData/Q465.nt). The problem is 
that this only returns part of the triples so far, not the whole data 
you find in the dumps.

* I think Martin was complaining about this limitation. Here is what he 
(or others) could do to rectify this:

(1) Let us know about use cases. Send an email to the Wikidata list: If 
we would get more linked data from you, we could do put your 
super-amazing application here. Ideally, you would have a demo of this 
application with Wikidata RDF data as found in the dumps. Development of 
a large site must be based on user demands, and if you look at the list, 
you can see many users voicing their demands most eloquently. We cannot 
ignore these requests in favour of something that is hardly ever requested.

(2) If you are an able PHP developer, offer your help. Several people on 
the Wikidata team would also like to see the linked data getting 
improved, but cannot do this on top of their other tasks. If somebody 
would do the main work, there would be support. Email me and I will put 
you in touch with the right people.

Best regards,

Markus


On 27.01.2015 20:31, Kingsley Idehen wrote:
 On 1/27/15 1:43 PM, Martin Brümmer wrote:
 I kind of disagree with you here. I regard and use DBpedia as a source
 of machine-readable linked data first. Because of its nature as
 derivative project extracting Wikipedia data, it is endangered by a
 potential future in which the Wikipedia crowd maintains their own
 machine-readable linked data to feed (among others) info boxes the
 DBpedia seeks to extract.
 Martin,

 DBpedia isn't *endangered*. Publishing content to a global HTTP based
 Network such as the World Wide Web isn't a zero sum affair.

 DBpedia's prime goal is to contribute to the Linked Open Data collective
 within the World Wide Web.

 To date, DBpedia has over achieved as the core that bootstrapped the
 Linked Open Data Cloud.

 Wikidata, Freebase, etc.. are complimentary initiatives. There gains or
 loses are not in any way affected by DBpedia.

 The Web is designed  on a horses for courses doctrine. We can indeed
 all get along, and be successful, without anyone having to lose out :)


 [1]
 http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FZero-sum_game
 -- About Zero Sum Game
 [2] http://en.wiktionary.org/wiki/horses_for_courses -- Horses for Courses.



 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/



 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


-- 
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Vladimir Alexiev

    -I cannot denote a timespan that limits the validity of a statement.

But you can. Make an IntermediateNode (e.g. Barack_Obama__1) and put what you 
will there.
This is used all the time for CareerPost, 
the association node between a player and a team, etc.

Political positions in many wikis are modeled with a lot of shopistication

Position1: title, country/region/city/ from which party, etc
   Term11: which (1,2,3), from, to
 Colleague111: title (e.g. vicePresident), from, to
 Colleague112: title, from, to
  Term21
Position 2
  Term21
Colleague211

The best you can map this to (for someone X) is

X careerPost X_1: Postion1(title); Term11(from-to);
coleague colleague111, colleague112.
X careerPost X_2: Postion1(title)  Term12(from-to)
X careerPost X_3: Position2; Term 21;
colleague colleague211

What you cannot map is point to an IntermediateNode of the colleague, and map 
the from/to of colleagues.
(And you can only map their position's title if you use subprops, e.g. 
vicePresidentcolleague)

But I'm fine with that: the big problem is that Wikipedia temlate params got no 
arrays,
so all these params end up being imaginatively numbered (not a precise 
numbering system like above :-)



--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Paul Houle

Well, I got to Allen's algebra of intervals because I was concerned about
how to deal with all of the different date time formats that are specified
in XSD. All of these can be treated, correctly, as either an interval or
a set of intervals.

Note there are modelling issues that go beyond this.

For instance, I still say we retain the birth date and death date
properties even though you could model somebody's life as an interval.
There are lots of practical reasons, but one of them is that I know my
life is not an open ended interval although it looks like that now.

Using this is a practical theory of time I can usually figure out what I
need to know.

I can say, however, if a person has a birthdate in Freebase of Jan 1, X,
odds are far less than 0.5 that the person was born on that day. Thus,
if I want to say anything abut people born on Jan 1, X and not look like a
fool, I need to go through those facts and figure out which ones I
believe. Thus, in some cases the data is really broken and energy must be
spent to overcome entropy.

On Tue, Jan 27, 2015 at 1:05 PM, M. Aaron Bossert maboss...@gmail.com
wrote:

Paul,

The date ranges are doable...I would say that one can still work either
as-is...and working with differing levels of specificity...if you work with
the dates as they are...

Aaron

On Jan 27, 2015, at 12:27, Paul Houle ontolo...@gmail.com wrote:

DBpedia has a mission that is focused around extracting data from
Wikipedia. Importing data wholesale from Wikidata or something like that
seems to be inconsistent with that mission, but there are all kinds of
temporal and provenance things that could be teased out of Wikipedia, if
not out of the Infoboxes.

I think most query scenarios are going to work like this

[Pot of data with provenance information] - [Data Set Representing a
POV] - query

I've been banging my head on the temporal aspect for a while and I am
convinced that the practical answer to a lot of problems is to replace
times with time intervals. Intervals can be used to model duration and
uncertainty and the overloading between those functions is not so bad
because usually you know from the context what the interval is being used
to represent.

There is a lot of pain right now if you want to work with dates from
either DBpedia or Freebase because different kinds of dates are specified
to different levels of detail. If you make a plot of people's birthdays in
Freebase for instance you find a lot of people born on Jan 1 I think
because that is something 'plausible' to put in.

A birth date could be resolved to a short interval (I know was I born at
4:06 in the afternoon) and astrologers would like to know that, but the
frequent use of a calendar day is a statement about imprecision, although
defining my birthday as a set of one day intervals the interval is
reflecting a social convention.

Anyway, there is an algebra over time intervals that is well accepted

http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852

and could be implemented either as a native XSD data type or by some
structure involving blank nodes.

On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com
wrote:

Martin,

When I first started working with RDF, I didn't fully get the full
expressivity of it. All of the things you are saying can't be done
(perhaps, easily?) are quite simple to implement. When compared to the
property graph model, RDF, at first glance, seems inferior, but in reality,
is much more expressive, in my opinion. Through reification, you can
express all of the concepts that you are wanting to (provenance, date
ranges, etc). At the end of the day, RDF's expressivity comes at the cost
of verbosity, which, in my opinion is well worth it.

If you would like some help in modeling your graph to represent the
missing concepts that you are after, I will be happy to help you out with
some more specific examples and pointers if it would be helpful to you.

Aaron

On Jan 27, 2015, at 06:33, Martin Brümmer
bruem...@informatik.uni-leipzig.de wrote:

Hi DBpedians!

As you surely have noticed, Google has abandoned Freebase and it will
merge with Wikidata [1]. I searched the list, but did not find a
discussion about it. So here goes my point of view:

When Wikidata was started, I hoped it would quickly become a major
contributor of quality data to the LOD cloud. But although the project
has a potentially massive crowd and is backed by Wikimedia, it does not
really care about the Linked Data paradigm as established in the
Semantic Web. RDF is more of an afterthought than a central concept. It
was a bit disappointing to see that Wikidata's impact on the LOD
community is lacking because of this.

Now Freebase will be integrated into Wikidata as a curated, Google
engineering hardened knowledge base not foreign to RDF and Linked Data.

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Kingsley Idehen


On 1/27/15 1:43 PM, Martin Brümmer wrote:
I kind of disagree with you here. I regard and use DBpedia as a source 
of machine-readable linked data first. Because of its nature as 
derivative project extracting Wikipedia data, it is endangered by a 
potential future in which the Wikipedia crowd maintains their own 
machine-readable linked data to feed (among others) info boxes the 
DBpedia seeks to extract.

Martin,

DBpedia isn't *endangered*. Publishing content to a global HTTP based 
Network such as the World Wide Web isn't a zero sum affair.


DBpedia's prime goal is to contribute to the Linked Open Data collective 
within the World Wide Web.


To date, DBpedia has over achieved as the core that bootstrapped the 
Linked Open Data Cloud.


Wikidata, Freebase, etc.. are complimentary initiatives. There gains or 
loses are not in any way affected by DBpedia.


The Web is designed  on a horses for courses doctrine. We can indeed 
all get along, and be successful, without anyone having to lose out :)



[1] 
http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FZero-sum_game 
-- About Zero Sum Game

[2] http://en.wiktionary.org/wiki/horses_for_courses -- Horses for Courses.

--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread M. Aaron Bossert

Paul,

The date ranges are doable...I would say that one can still work either
as-is...and working with differing levels of specificity...if you work with the
dates as they are...

Aaron

On Jan 27, 2015, at 12:27, Paul Houle ontolo...@gmail.com wrote:

DBpedia has a mission that is focused around extracting data from Wikipedia.
Importing data wholesale from Wikidata or something like that seems to be
inconsistent with that mission, but there are all kinds of temporal and
provenance things that could be teased out of Wikipedia, if not out of the
Infoboxes.

I think most query scenarios are going to work like this

[Pot of data with provenance information] - [Data Set Representing a POV]
- query

I've been banging my head on the temporal aspect for a while and I am
convinced that the practical answer to a lot of problems is to replace times
with time intervals. Intervals can be used to model duration and uncertainty
and the overloading between those functions is not so bad because usually you
know from the context what the interval is being used to represent.

There is a lot of pain right now if you want to work with dates from either
DBpedia or Freebase because different kinds of dates are specified to
different levels of detail. If you make a plot of people's birthdays in
Freebase for instance you find a lot of people born on Jan 1 I think because
that is something 'plausible' to put in.

Anyway, there is an algebra over time intervals that is well accepted

http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852

and could be implemented either as a native XSD data type or by some
structure involving blank nodes.

On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com
wrote:
Martin,

If you would like some help in modeling your graph to represent the missing
concepts that you are after, I will be happy to help you out with some more
specific examples and pointers if it would be helpful to you.

Aaron

On Jan 27, 2015, at 06:33, Martin Brümmer
bruem...@informatik.uni-leipzig.de wrote:

Hi DBpedians!

As you surely have noticed, Google has abandoned Freebase and it will
merge with Wikidata [1]. I searched the list, but did not find a
discussion about it. So here goes my point of view:

Now Freebase will be integrated into Wikidata as a curated, Google
engineering hardened knowledge base not foreign to RDF and Linked Data.
How the integration will be realized is not yet clear it seems. One
consequence is hopefully, that the LOD cloud grows by a significant
amount of quality data. But I wonder what the consequences for the
DBpedia project will be? If Wikimedia gets their own knowledge graph,
possible curated by their crowd, where is the place for the DBpedia? Can
DBpedia stay relevant with all the problems of an open source project,
all the difficulties with mapping heterogeneous data in many different
languages, the resulting struggle with data quality and consistency and
so on?

So I propose being proactive about it:

I see a large problem of the DBpedia with restrictions of the RDF data
model. Triples limit our ability to make statements about statements. I
cannot easily address a fact in the DBpedia and annotate it. This means:

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Martin Brümmer

Hi Magnus,

Am 27.01.2015 um 15:12 schrieb Magnus Knuth:
 Hi Martin,

 how daring that you started this discussion :D

Well, I just felt like stirring up the community a bit so people have
something to argue about while sitting in Dublin's beautiful pubs ;)

 I just want to put my 2 cents in it.

 I think you are mixing things up. Wikipedia, DBpedia, Wikidata, and Freebase 
 are more or less standalone projects. Some are synced, depending, or 
 partially imported into another. But, there is no need and no use of fully 
 importing Wikidata into DBpedia! Better get an RDF dump of Wikidata.

I'm not so sure about that. From a LOD users perspective, the idea of a
place that integrates encyclopedic knowledge in a comprehensive way with
high quality is very attractive to me. I'm not alone with that,
evidenced by DBpedia's central place in the LOD cloud. RDF dumps are not
very easy and reliable to handle and most importantly not linked data.

 The intended import of Freebase data to Wikidata will hardly be complete. One 
 reason is that Freebase has no references of single facts to a particular 
 source, which is a requirement for claims in Wikidata. I.e. unfortunately 
 Freebase will never become imported to Wikidata completely.
 Freebase has it’s own community of contributors that provide and link facts 
 into the knowledge base. Freebase’s biggest advantage is the easy import of 
 own data. Time will show how this is adapted to Wikidata.
 Opposite there is DBpedia, which (currently) does not support manipulating 
 A-Box facts. As Alexandru said, DBpedia is about extraction.
You might be right that Freebase can not be completely merged into
Wikidata and that all projects will coexist in their own niche. However,
I believe that even then it is a worthwhile cause to tackle triple level
provenance, modelling time constraints and persistence of facts
throughout DBpedia versions.

It's interesting that you bring up manipulating A-Box facts. If we could
address individual triples, making statements about them, including
their validity and possibly correcting them indivdually without the
change being lost after the next conversion could be possible. One might
argue that these changes should be done directly in the Wikipedia, but
this sometimes implies bureaucracy with Wikipedia editors that I would
like to avoid.

regards,
Martin


 Am 27.01.2015 um 13:46 schrieb Alexandru Todor to...@inf.fu-berlin.de:

 Hi Martin

 We discussed this issue a bit in the developer hangout, sadly to few people 
 are usually present.

 On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer 
 bruem...@informatik.uni-leipzig.de wrote:
 Hi DBpedians!

 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:

 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.

  I think it's more of a resource/implementation problem for them. Publishing 
 linked data requires a major commitment and the tools for it are more than 
 lacking in refinement.


 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and consistency and
 so on?

 Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for 
 structured data while DBpedia is an Information Extraction Framework with a 
 crowdsourced component that is the mappings wiki. While wikidata might gain 
 a lot of data from Freebase, it won't help them that much if Google does not 
 give the Information Extraction framework behind Freebase. It would mean 
 that the data would get old very fast and the community won't be able to 
 update and maintain it. Though What exactly Google will do remains to be 
 seen.


 So I propose being proactive about it:

 I agree with being proactive, we have a lot of problems in DBpedia that need 
 to be addressed. 

 Indeed DBpedia community should think about a roadmap for future developments.

 I see a large problem of the

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Paul Houle

There's the interesting question of,  if we were building something like
Freebase today based on RDF,  what sort of facilities would be built in for
'Wiki' management.  That is,  you need provenance metadata not so much to
say These great population numbers for county X are from the world bank
(and if you look closely they linearly interpolate between censuses which
could be ten or more years apart) but more to say User Z asserted 70,000
bogus triples

On Tue, Jan 27, 2015 at 1:43 PM, Martin Brümmer 
bruem...@informatik.uni-leipzig.de wrote:

  Hi Alexandru,

 Am 27.01.2015 um 13:46 schrieb Alexandru Todor:

 Hi Martin

  We discussed this issue a bit in the developer hangout, sadly to few
 people are usually present.

 On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer 
 bruem...@informatik.uni-leipzig.de wrote:

 Hi DBpedians!

 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:

 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.


   I think it's more of a resource/implementation problem for them.
 Publishing linked data requires a major commitment and the tools for it are
 more than lacking in refinement.


 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and consistency and
 so on?


  Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for
 structured data while DBpedia is an Information Extraction Framework with a
 crowdsourced component that is the mappings wiki. While wikidata might gain
 a lot of data from Freebase, it won't help them that much if Google does
 not give the Information Extraction framework behind Freebase. It would
 mean that the data would get old very fast and the community won't be able
 to update and maintain it. Though What exactly Google will do remains to be
 seen.


 I kind of disagree with you here. I regard and use DBpedia as a source of
 machine-readable linked data first. Because of its nature as derivative
 project extracting Wikipedia data, it is endangered by a potential future
 in which the Wikipedia crowd maintains their own machine-readable linked
 data to feed (among others) info boxes the DBpedia seeks to extract. I fear
 that, with Freebase becoming a part of Wikidata, this future becomes a
 little more likely to happen, even if we don't know what Google does, as
 you rightfully say.



 So I propose being proactive about it:


  I agree with being proactive, we have a lot of problems in DBpedia that
 need to be addressed.


 I see a large problem of the DBpedia with restrictions of the RDF data
 model. Triples limit our ability to make statements about statements. I
 cannot easily address a fact in the DBpedia and annotate it. This means:


  DBpedia is not only available in triples but also in N-quads.


 -I cannot denote the provenance of a statement. I especially cannot
 denote the source data it comes from. Resource level provenance is not
 sufficient if further datasets are to be integrated into DBpedia in the
 future.

 -I cannot denote a timespan that limits the validity of a statement.
 Consider the fact that Barack Obama is the president of the USA. This
 fact was not valid at a point in the past and won't be valid at some
 point in the future. Now I might link the DBpedia page of Barack Obama
 for this fact. Now if a DBpedia version is published after the next
 president of the USA was elected, this fact might be missing from the
 DBpedia and my link becomes moot. -This is a problem with
 persistency. Being able to download old dumps of DBpedia is not a
 sufficient model of persistency. The community struggles to increase
 data quality, but as soon as a new version is published, it drops some
 of the progress made in favour of whatever facts are found in the
 Wikipedia dumps at the time of extraction. The old facts should persist,
 not only in some dump files, but

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Martin Brümmer

Hi Alexandru,

Am 27.01.2015 um 13:46 schrieb Alexandru Todor:
 Hi Martin

 We discussed this issue a bit in the developer hangout, sadly to few
 people are usually present.

 On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer
 bruem...@informatik.uni-leipzig.de
 mailto:bruem...@informatik.uni-leipzig.de wrote:

 Hi DBpedians!

 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:

 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it
 does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central
 concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.


  I think it's more of a resource/implementation problem for them.
 Publishing linked data requires a major commitment and the tools for
 it are more than lacking in refinement.


 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked
 Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the
 DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and
 consistency and
 so on?


 Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for
 structured data while DBpedia is an Information Extraction Framework
 with a crowdsourced component that is the mappings wiki. While
 wikidata might gain a lot of data from Freebase, it won't help them
 that much if Google does not give the Information Extraction framework
 behind Freebase. It would mean that the data would get old very fast
 and the community won't be able to update and maintain it. Though What
 exactly Google will do remains to be seen.

I kind of disagree with you here. I regard and use DBpedia as a source
of machine-readable linked data first. Because of its nature as
derivative project extracting Wikipedia data, it is endangered by a
potential future in which the Wikipedia crowd maintains their own
machine-readable linked data to feed (among others) info boxes the
DBpedia seeks to extract. I fear that, with Freebase becoming a part of
Wikidata, this future becomes a little more likely to happen, even if we
don't know what Google does, as you rightfully say.



 So I propose being proactive about it:


 I agree with being proactive, we have a lot of problems in DBpedia
 that need to be addressed. 


 I see a large problem of the DBpedia with restrictions of the RDF data
 model. Triples limit our ability to make statements about
 statements. I
 cannot easily address a fact in the DBpedia and annotate it. This
 means:


 DBpedia is not only available in triples but also in N-quads.


 -I cannot denote the provenance of a statement. I especially
 cannot
 denote the source data it comes from. Resource level provenance is not
 sufficient if further datasets are to be integrated into DBpedia
 in the
 future.

 -I cannot denote a timespan that limits the validity of a
 statement.
 Consider the fact that Barack Obama is the president of the USA. This
 fact was not valid at a point in the past and won't be valid at some
 point in the future. Now I might link the DBpedia page of Barack Obama
 for this fact. Now if a DBpedia version is published after the next
 president of the USA was elected, this fact might be missing from the
 DBpedia and my link becomes moot. -This is a problem with
 persistency. Being able to download old dumps of DBpedia is not a
 sufficient model of persistency. The community struggles to increase
 data quality, but as soon as a new version is published, it drops some
 of the progress made in favour of whatever facts are found in the
 Wikipedia dumps at the time of extraction. The old facts should
 persist,
 not only in some dump files, but as linkable data.


 Being able to address these problems would also mean being able to
 fully
 import Wikidata, including provenance statements and validity
 timespans,
 and combine it with the DBpedia ontology (which already is an
 important
 focus of development and

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Magnus Knuth

Hi Martin,

how daring that you started this discussion :D
I just want to put my 2 cents in it.

I think you are mixing things up. Wikipedia, DBpedia, Wikidata, and Freebase 
are more or less standalone projects. Some are synced, depending, or partially 
imported into another. But, there is no need and no use of fully importing 
Wikidata into DBpedia! Better get an RDF dump of Wikidata.
The intended import of Freebase data to Wikidata will hardly be complete. One 
reason is that Freebase has no references of single facts to a particular 
source, which is a requirement for claims in Wikidata. I.e. unfortunately 
Freebase will never become imported to Wikidata completely.
Freebase has it’s own community of contributors that provide and link facts 
into the knowledge base. Freebase’s biggest advantage is the easy import of own 
data. Time will show how this is adapted to Wikidata.
Opposite there is DBpedia, which (currently) does not support manipulating 
A-Box facts. As Alexandru said, DBpedia is about extraction.

Am 27.01.2015 um 13:46 schrieb Alexandru Todor to...@inf.fu-berlin.de:

 Hi Martin
 
 We discussed this issue a bit in the developer hangout, sadly to few people 
 are usually present.
 
 On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer 
 bruem...@informatik.uni-leipzig.de wrote:
 Hi DBpedians!
 
 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:
 
 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.
 
  I think it's more of a resource/implementation problem for them. Publishing 
 linked data requires a major commitment and the tools for it are more than 
 lacking in refinement.
 
 
 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and consistency and
 so on?
 
 Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for 
 structured data while DBpedia is an Information Extraction Framework with a 
 crowdsourced component that is the mappings wiki. While wikidata might gain a 
 lot of data from Freebase, it won't help them that much if Google does not 
 give the Information Extraction framework behind Freebase. It would mean that 
 the data would get old very fast and the community won't be able to update 
 and maintain it. Though What exactly Google will do remains to be seen.
 
 
 So I propose being proactive about it:
 
 I agree with being proactive, we have a lot of problems in DBpedia that need 
 to be addressed. 
 

Indeed DBpedia community should think about a roadmap for future developments.

 
 I see a large problem of the DBpedia with restrictions of the RDF data
 model. Triples limit our ability to make statements about statements. I
 cannot easily address a fact in the DBpedia and annotate it. This means:
 
 DBpedia is not only available in triples but also in N-quads.
 

I do not see any problem with restrictions of the RDF data model as a data 
exchange framework. But I admit there are some limitations with managing 
changes and also provenance. However, that is not relevant for most 
applications that want to work with this data.

 
 -I cannot denote the provenance of a statement. I especially cannot
 denote the source data it comes from. Resource level provenance is not
 sufficient if further datasets are to be integrated into DBpedia in the
 future.

As Alexandru said, N-quads can be a solution for this. DBpedia extraction 
framework already supports multiple datasets, at least one for each extraction 
step. Actually I don’t know whether they are currently delivered or that is 
behind Virtuoso’s capabilities.

 -I cannot denote a timespan that limits the validity of a statement.
 Consider the fact that Barack Obama is the president of the USA. This
 fact was not valid at a point in the past and won't be valid at some
 point in the future. Now I might link the DBpedia page of Barack Obama
 for this fact.

[Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Martin Brümmer

Hi DBpedians!

As you surely have noticed, Google has abandoned Freebase and it will
merge with Wikidata [1]. I searched the list, but did not find a
discussion about it. So here goes my point of view:

When Wikidata was started, I hoped it would quickly become a major
contributor of quality data to the LOD cloud. But although the project
has a potentially massive crowd and is backed by Wikimedia, it does not
really care about the Linked Data paradigm as established in the
Semantic Web. RDF is more of an afterthought than a central concept. It
was a bit disappointing to see that Wikidata's impact on the LOD
community is lacking because of this.

Now Freebase will be integrated into Wikidata as a curated, Google
engineering hardened knowledge base not foreign to RDF and Linked Data.
How the integration will be realized is not yet clear it seems. One
consequence is hopefully, that the LOD cloud grows by a significant
amount of quality data. But I wonder what the consequences for the
DBpedia project will be? If Wikimedia gets their own knowledge graph,
possible curated by their crowd, where is the place for the DBpedia? Can
DBpedia stay relevant with all the problems of an open source project,
all the difficulties with mapping heterogeneous data in many different
languages, the resulting struggle with data quality and consistency and
so on?

So I propose being proactive about it:

I see a large problem of the DBpedia with restrictions of the RDF data
model. Triples limit our ability to make statements about statements. I
cannot easily address a fact in the DBpedia and annotate it. This means:

-I cannot denote the provenance of a statement. I especially cannot
denote the source data it comes from. Resource level provenance is not
sufficient if further datasets are to be integrated into DBpedia in the
future.
-I cannot denote a timespan that limits the validity of a statement.
Consider the fact that Barack Obama is the president of the USA. This
fact was not valid at a point in the past and won't be valid at some
point in the future. Now I might link the DBpedia page of Barack Obama
for this fact. Now if a DBpedia version is published after the next
president of the USA was elected, this fact might be missing from the
DBpedia and my link becomes moot. -This is a problem with
persistency. Being able to download old dumps of DBpedia is not a
sufficient model of persistency. The community struggles to increase
data quality, but as soon as a new version is published, it drops some
of the progress made in favour of whatever facts are found in the
Wikipedia dumps at the time of extraction. The old facts should persist,
not only in some dump files, but as linkable data.

Being able to address these problems would also mean being able to fully
import Wikidata, including provenance statements and validity timespans,
and combine it with the DBpedia ontology (which already is an important
focus of development and rightfully so). It also means a persistent
DBpedia that does not start over in the next version.

So how can it be realized? With reification of course! But most of us
resent the problems reification brings with it, the complications in
querying etc. The reification model itself is also unclear. There are
different proposals, blank nodes, reification vocabulary, graph names,
creating unique subproperties for each triple etc. Now I won't propose
using one of these models, this will surely be subject to discussion.
But the DBpedia can propose a model and the LOD community will adapt,
due to DBpedia's state and impact. I think it is time to up the standard
of handling provenance and persistence in the LOD cloud and DBpedia
should make the start. Especially in the face of Freebase and Wikidata
merging, I believe it is imperative for the DBpedia to move forward.

regards,
Martin

[1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Alexandru Todor

Hi Martin

We discussed this issue a bit in the developer hangout, sadly to few people
are usually present.

On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer 
bruem...@informatik.uni-leipzig.de wrote:

 Hi DBpedians!

 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:

 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.


 I think it's more of a resource/implementation problem for them.
Publishing linked data requires a major commitment and the tools for it are
more than lacking in refinement.


 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and consistency and
 so on?


Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for
structured data while DBpedia is an Information Extraction Framework with a
crowdsourced component that is the mappings wiki. While wikidata might gain
a lot of data from Freebase, it won't help them that much if Google does
not give the Information Extraction framework behind Freebase. It would
mean that the data would get old very fast and the community won't be able
to update and maintain it. Though What exactly Google will do remains to be
seen.


 So I propose being proactive about it:


I agree with being proactive, we have a lot of problems in DBpedia that
need to be addressed.


 I see a large problem of the DBpedia with restrictions of the RDF data
 model. Triples limit our ability to make statements about statements. I
 cannot easily address a fact in the DBpedia and annotate it. This means:


DBpedia is not only available in triples but also in N-quads.


 -I cannot denote the provenance of a statement. I especially cannot
 denote the source data it comes from. Resource level provenance is not
 sufficient if further datasets are to be integrated into DBpedia in the
 future.

-I cannot denote a timespan that limits the validity of a statement.
 Consider the fact that Barack Obama is the president of the USA. This
 fact was not valid at a point in the past and won't be valid at some
 point in the future. Now I might link the DBpedia page of Barack Obama
 for this fact. Now if a DBpedia version is published after the next
 president of the USA was elected, this fact might be missing from the
 DBpedia and my link becomes moot. -This is a problem with
 persistency. Being able to download old dumps of DBpedia is not a
 sufficient model of persistency. The community struggles to increase
 data quality, but as soon as a new version is published, it drops some
 of the progress made in favour of whatever facts are found in the
 Wikipedia dumps at the time of extraction. The old facts should persist,
 not only in some dump files, but as linkable data.


 Being able to address these problems would also mean being able to fully
 import Wikidata, including provenance statements and validity timespans,
 and combine it with the DBpedia ontology (which already is an important
 focus of development and rightfully so). It also means a persistent
 DBpedia that does not start over in the next version.

 So how can it be realized? With reification of course! But most of us
 resent the problems reification brings with it, the complications in
 querying etc. The reification model itself is also unclear. There are
 different proposals, blank nodes, reification vocabulary, graph names,
 creating unique subproperties for each triple etc. Now I won't propose
 using one of these models, this will surely be subject to discussion.
 But the DBpedia can propose a model and the LOD community will adapt,
 due to DBpedia's state and impact. I think it is time to up the standard
 of handling provenance and persistence in the LOD cloud and DBpedia
 should make the start. Especially in the face of Freebase and Wikidata
 merging, I believe it is imperative for the DBpedia to move forward.


The problem of different changes

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread Paul Houle

I think most query scenarios are going to work like this

[Pot of data with provenance information] - [Data Set Representing a POV]
- query

There is a lot of pain right now if you want to work with dates from either
DBpedia or Freebase because different kinds of dates are specified to
different levels of detail. If you make a plot of people's birthdays in
Freebase for instance you find a lot of people born on Jan 1 I think
because that is something 'plausible' to put in.

Anyway, there is an algebra over time intervals that is well accepted

http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852

and could be implemented either as a native XSD data type or by some
structure involving blank nodes.

On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com
wrote:

Martin,

Aaron

On Jan 27, 2015, at 06:33, Martin Brümmer
bruem...@informatik.uni-leipzig.de wrote:

Hi DBpedians!

As you surely have noticed, Google has abandoned Freebase and it will
merge with Wikidata [1]. I searched the list, but did not find a
discussion about it. So here goes my point of view:

So I propose being proactive about it:

-I cannot denote the provenance of a statement. I especially cannot
denote the source data it comes from. Resource level provenance is not
sufficient if further datasets are to be integrated into DBpedia in the
future.
-I cannot denote a timespan that limits the validity of a statement.
Consider the fact that Barack Obama is the president of the USA. This
fact was not valid at a point in the past and won't be valid at some
point in the future. Now I might link the DBpedia page of Barack Obama
for this fact. Now if a DBpedia version is published after

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

2015-01-27 Thread M. Aaron Bossert

Martin,

When I first started working with RDF, I didn't fully get the full 
expressivity of it.  All of the things you are saying can't be done (perhaps, 
easily?) are quite simple to implement.  When compared to the property graph 
model, RDF, at first glance, seems inferior, but in reality, is much more 
expressive, in my opinion.  Through reification, you can express all of the 
concepts that you are wanting to (provenance, date ranges, etc).  At the end of 
the day, RDF's expressivity comes at the cost of verbosity, which, in my 
opinion is well worth it.

If you would like some help in modeling your graph to represent the missing 
concepts that you are after, I will be happy to help you out with some more 
specific examples and pointers if it would be helpful to you.

Aaron

 On Jan 27, 2015, at 06:33, Martin Brümmer 
 bruem...@informatik.uni-leipzig.de wrote:
 
 Hi DBpedians!
 
 As you surely have noticed, Google has abandoned Freebase and it will
 merge with Wikidata [1]. I searched the list, but did not find a
 discussion about it. So here goes my point of view:
 
 When Wikidata was started, I hoped it would quickly become a major
 contributor of quality data to the LOD cloud. But although the project
 has a potentially massive crowd and is backed by Wikimedia, it does not
 really care about the Linked Data paradigm as established in the
 Semantic Web. RDF is more of an afterthought than a central concept. It
 was a bit disappointing to see that Wikidata's impact on the LOD
 community is lacking because of this.
 
 Now Freebase will be integrated into Wikidata as a curated, Google
 engineering hardened knowledge base not foreign to RDF and Linked Data.
 How the integration will be realized is not yet clear it seems. One
 consequence is hopefully, that the LOD cloud grows by a significant
 amount of quality data. But I wonder what the consequences for the
 DBpedia project will be? If Wikimedia gets their own knowledge graph,
 possible curated by their crowd, where is the place for the DBpedia? Can
 DBpedia stay relevant with all the problems of an open source project,
 all the difficulties with mapping heterogeneous data in many different
 languages, the resulting struggle with data quality and consistency and
 so on?
 
 So I propose being proactive about it:
 
 I see a large problem of the DBpedia with restrictions of the RDF data
 model. Triples limit our ability to make statements about statements. I
 cannot easily address a fact in the DBpedia and annotate it. This means:
 
-I cannot denote the provenance of a statement. I especially cannot
 denote the source data it comes from. Resource level provenance is not
 sufficient if further datasets are to be integrated into DBpedia in the
 future.
-I cannot denote a timespan that limits the validity of a statement.
 Consider the fact that Barack Obama is the president of the USA. This
 fact was not valid at a point in the past and won't be valid at some
 point in the future. Now I might link the DBpedia page of Barack Obama
 for this fact. Now if a DBpedia version is published after the next
 president of the USA was elected, this fact might be missing from the
 DBpedia and my link becomes moot. -This is a problem with
 persistency. Being able to download old dumps of DBpedia is not a
 sufficient model of persistency. The community struggles to increase
 data quality, but as soon as a new version is published, it drops some
 of the progress made in favour of whatever facts are found in the
 Wikipedia dumps at the time of extraction. The old facts should persist,
 not only in some dump files, but as linkable data.
 
 Being able to address these problems would also mean being able to fully
 import Wikidata, including provenance statements and validity timespans,
 and combine it with the DBpedia ontology (which already is an important
 focus of development and rightfully so). It also means a persistent
 DBpedia that does not start over in the next version.
 
 So how can it be realized? With reification of course! But most of us
 resent the problems reification brings with it, the complications in
 querying etc. The reification model itself is also unclear. There are
 different proposals, blank nodes, reification vocabulary, graph names,
 creating unique subproperties for each triple etc. Now I won't propose
 using one of these models, this will surely be subject to discussion.
 But the DBpedia can propose a model and the LOD community will adapt,
 due to DBpedia's state and impact. I think it is time to up the standard
 of handling provenance and persistence in the LOD cloud and DBpedia
 should make the start. Especially in the face of Freebase and Wikidata
 merging, I believe it is imperative for the DBpedia to move forward.
 
 regards,
 Martin
 
 [1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
 
 --
 Dive into the World of Parallel Programming.

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

[Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

13 matches

Site Navigation

Mail list logo

Footer information