Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Martin Brümmer Tue, 27 Jan 2015 11:02:53 -0800

Hi Magnus,

Am 27.01.2015 um 15:12 schrieb Magnus Knuth:
> Hi Martin,
>
> how daring that you started this discussion :D


Well, I just felt like stirring up the community a bit so people have
something to argue about while sitting in Dublin's beautiful pubs ;)

> I just want to put my 2 cents in it.
>
> I think you are mixing things up. Wikipedia, DBpedia, Wikidata, and Freebase 
> are more or less standalone projects. Some are synced, depending, or 
> partially imported into another. But, there is no need and no use of fully 
> importing Wikidata into DBpedia! Better get an RDF dump of Wikidata.

I'm not so sure about that. From a LOD users perspective, the idea of a
place that integrates encyclopedic knowledge in a comprehensive way with
high quality is very attractive to me. I'm not alone with that,
evidenced by DBpedia's central place in the LOD cloud. RDF dumps are not
very easy and reliable to handle and most importantly not linked data.

> The intended import of Freebase data to Wikidata will hardly be complete. One 
> reason is that Freebase has no references of single facts to a particular 
> source, which is a requirement for claims in Wikidata. I.e. unfortunately 
> Freebase will never become imported to Wikidata completely.
> Freebase has it’s own community of contributors that provide and link facts 
> into the knowledge base. Freebase’s biggest advantage is the easy import of 
> own data. Time will show how this is adapted to Wikidata.
> Opposite there is DBpedia, which (currently) does not support manipulating 
> A-Box facts. As Alexandru said, DBpedia is about extraction.
You might be right that Freebase can not be completely merged into
Wikidata and that all projects will coexist in their own niche. However,
I believe that even then it is a worthwhile cause to tackle triple level
provenance, modelling time constraints and persistence of facts
throughout DBpedia versions.

It's interesting that you bring up manipulating A-Box facts. If we could
address individual triples, making statements about them, including
their validity and possibly correcting them indivdually without the
change being lost after the next conversion could be possible. One might
argue that these changes should be done directly in the Wikipedia, but
this sometimes implies bureaucracy with Wikipedia editors that I would
like to avoid.

regards,
Martin

>
> Am 27.01.2015 um 13:46 schrieb Alexandru Todor <to...@inf.fu-berlin.de>:
>
>> Hi Martin
>>
>> We discussed this issue a bit in the developer hangout, sadly to few people 
>> are usually present.
>>
>> On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer 
>> <bruem...@informatik.uni-leipzig.de> wrote:
>> Hi DBpedians!
>>
>> As you surely have noticed, Google has abandoned Freebase and it will
>> merge with Wikidata [1]. I searched the list, but did not find a
>> discussion about it. So here goes my point of view:
>>
>> When Wikidata was started, I hoped it would quickly become a major
>> contributor of quality data to the LOD cloud. But although the project
>> has a potentially massive crowd and is backed by Wikimedia, it does not
>> really care about the Linked Data paradigm as established in the
>> Semantic Web. RDF is more of an afterthought than a central concept. It
>> was a bit disappointing to see that Wikidata's impact on the LOD
>> community is lacking because of this.
>>
>>  I think it's more of a resource/implementation problem for them. Publishing 
>> linked data requires a major commitment and the tools for it are more than 
>> lacking in refinement.
>>
>>
>> Now Freebase will be integrated into Wikidata as a curated, Google
>> engineering hardened knowledge base not foreign to RDF and Linked Data.
>> How the integration will be realized is not yet clear it seems. One
>> consequence is hopefully, that the LOD cloud grows by a significant
>> amount of quality data. But I wonder what the consequences for the
>> DBpedia project will be? If Wikimedia gets their own knowledge graph,
>> possible curated by their crowd, where is the place for the DBpedia? Can
>> DBpedia stay relevant with all the problems of an open source project,
>> all the difficulties with mapping heterogeneous data in many different
>> languages, the resulting struggle with data quality and consistency and
>> so on?
>>
>> Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for 
>> structured data while DBpedia is an Information Extraction Framework with a 
>> crowdsourced component that is the mappings wiki. While wikidata might gain 
>> a lot of data from Freebase, it won't help them that much if Google does not 
>> give the Information Extraction framework behind Freebase. It would mean 
>> that the data would get old very fast and the community won't be able to 
>> update and maintain it. Though What exactly Google will do remains to be 
>> seen.
>>
>>
>> So I propose being proactive about it:
>>
>> I agree with being proactive, we have a lot of problems in DBpedia that need 
>> to be addressed. 
>>
> Indeed DBpedia community should think about a roadmap for future developments.
>
>> I see a large problem of the DBpedia with restrictions of the RDF data
>> model. Triples limit our ability to make statements about statements. I
>> cannot easily address a fact in the DBpedia and annotate it. This means:
>>
>> DBpedia is not only available in triples but also in N-quads.
>>
> I do not see any problem with restrictions of the RDF data model as a data 
> exchange framework. But I admit there are some limitations with managing 
> changes and also provenance. However, that is not relevant for most 
> applications that want to work with this data.
>
>>     -I cannot denote the provenance of a statement. I especially cannot
>> denote the source data it comes from. Resource level provenance is not
>> sufficient if further datasets are to be integrated into DBpedia in the
>> future.
> As Alexandru said, N-quads can be a solution for this. DBpedia extraction 
> framework already supports multiple datasets, at least one for each 
> extraction step. Actually I don’t know whether they are currently delivered 
> or that is behind Virtuoso’s capabilities.
>
>>     -I cannot denote a timespan that limits the validity of a statement.
>> Consider the fact that Barack Obama is the president of the USA. This
>> fact was not valid at a point in the past and won't be valid at some
>> point in the future. Now I might link the DBpedia page of Barack Obama
>> for this fact. Now if a DBpedia version is published after the next
>> president of the USA was elected, this fact might be missing from the
>> DBpedia and my link becomes moot.    
> Roles in time can also be represented by intermediate role instances. 
> Freebase does that similarly, e.g. 
> [http://www.freebase.com/m/0bj7n?props=&lang=en&filter=%2Fgovernment%2Fpolitician%2Fgovernment_positions_held],
>  afaik they don’t support validity of statements in time!?
>
> Speaking about modeling entities in time (and space) and since you are in 
> Leipzig, I strongly recommend "Ontologie für Informationssysteme - Vorlesung 
> (H.Herre)" [http://www.informatik.uni-leipzig.de/fk/lehre/fk_lehre.html], if 
> you haven’t attended already.
>
>> -This is a problem with
>> persistency. Being able to download old dumps of DBpedia is not a
>> sufficient model of persistency. The community struggles to increase
>> data quality, but as soon as a new version is published, it drops some
>> of the progress made in favour of whatever facts are found in the
>> Wikipedia dumps at the time of extraction. The old facts should persist,
>> not only in some dump files, but as linkable data.
>>
> DBpedia already supports Memento, which is an accepted standard for Linked 
> Data. DBpedia versions are going back to 3.0.
>
>> Being able to address these problems would also mean being able to fully
>> import Wikidata, including provenance statements and validity timespans,
>> and combine it with the DBpedia ontology (which already is an important
>> focus of development and rightfully so). It also means a persistent
>> DBpedia that does not start over in the next version.
>>
>> So how can it be realized? With reification of course! But most of us
>> resent the problems reification brings with it, the complications in
>> querying etc. The reification model itself is also unclear. There are
>> different proposals, blank nodes, reification vocabulary, graph names,
>> creating unique subproperties for each triple etc. Now I won't propose
>> using one of these models, this will surely be subject to discussion.
>> But the DBpedia can propose a model and the LOD community will adapt,
>> due to DBpedia's state and impact. I think it is time to up the standard
>> of handling provenance and persistence in the LOD cloud and DBpedia
>> should make the start. Especially in the face of Freebase and Wikidata
>> merging, I believe it is imperative for the DBpedia to move forward.
>>
>> The problem of different changes during time in Wikipedia has been addressed 
>> in DBpedia Live and a demo has been presented at the last meeting in Leipzig 
>> under the title Versioning DBpedia Live using Memento  [3] 
> Let me know, if you are interested in this. A student of mine is working on 
> that currently.
>
> Proposing best practice, models, technologies, and vocabularies for the LOD 
> is definitely an imperative for DBpedia since it has been a central element 
> and reference for a long time and should be further on.
>
>> As you mentioned RDF reification can has drawbacks regarding performance and 
>> verbosity. We've had a similar need in one of the applications we developed, 
>> reified statements were simply impractical due to their verbosity and 
>> performance impact. The solution we came up with was using N-quads to use 
>> the 4th quad as an ID for an index. By looking up the ID you can find out 
>> information regarding provenance, time etc. I think this is more of a Graph 
>> Database problem. We should look at ways it can be implemented effectively 
>> in RDF-stores and then propose modifications to the RDF/Sparql standard if 
>> needed. Maybe the people from OpenLink or other RDF-Store researchers have 
>> some ideas on this issue.
>>
>> Cheers,
>> Alexandru
>>
> Best
> Magnus
>
>> [1] http://sw.deri.org/2008/07/n-quads/
>> [2] http://patterns.dataincubator.org/book/reified-statement.html
>> [3] http://wiki.dbpedia.org/meetings/Leipzig2014
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. 
>> http://goparallel.sourceforge.net/_______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia

Reply via email to