Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Well said, Kingsley :-) As a lurker on this list, I found this an interesting discussion. I can understand Martin's sentiment, and the desire to do something to ensure that DBpedia will continue to be successful in the future. However, in my experience, such strategic discussions rarely lead to much. My advice is to simply make sure that DBpedia is the best it can be, without worrying too much about competitors. Your users do not expect you to become like Wikidata or Freebase (or to import their data) -- they want things that only DBpedia can provide and which it is best at providing. Some comments regarding Wikidata RDF and Linked Data exports: * Wikidata's RDF dumps use a kind of reification, but not the deprecated RDF reification vocabulary. The details are explained in our ISWC'14 paper https://ddll.inf.tu-dresden.de/web/Inproceedings4005/en * Wikidata serves linked data via content negotiation in its IRIs, e.g., http://www.wikidata.org/entity/Q465 (the RDF data you get there is http://www.wikidata.org/wiki/Special:EntityData/Q465.nt). The problem is that this only returns part of the triples so far, not the whole data you find in the dumps. * I think Martin was complaining about this limitation. Here is what he (or others) could do to rectify this: (1) Let us know about use cases. Send an email to the Wikidata list: If we would get more linked data from you, we could do put your super-amazing application here. Ideally, you would have a demo of this application with Wikidata RDF data as found in the dumps. Development of a large site must be based on user demands, and if you look at the list, you can see many users voicing their demands most eloquently. We cannot ignore these requests in favour of something that is hardly ever requested. (2) If you are an able PHP developer, offer your help. Several people on the Wikidata team would also like to see the linked data getting improved, but cannot do this on top of their other tasks. If somebody would do the main work, there would be support. Email me and I will put you in touch with the right people. Best regards, Markus On 27.01.2015 20:31, Kingsley Idehen wrote: On 1/27/15 1:43 PM, Martin Brümmer wrote: I kind of disagree with you here. I regard and use DBpedia as a source of machine-readable linked data first. Because of its nature as derivative project extracting Wikipedia data, it is endangered by a potential future in which the Wikipedia crowd maintains their own machine-readable linked data to feed (among others) info boxes the DBpedia seeks to extract. Martin, DBpedia isn't *endangered*. Publishing content to a global HTTP based Network such as the World Wide Web isn't a zero sum affair. DBpedia's prime goal is to contribute to the Linked Open Data collective within the World Wide Web. To date, DBpedia has over achieved as the core that bootstrapped the Linked Open Data Cloud. Wikidata, Freebase, etc.. are complimentary initiatives. There gains or loses are not in any way affected by DBpedia. The Web is designed on a horses for courses doctrine. We can indeed all get along, and be successful, without anyone having to lose out :) [1] http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FZero-sum_game -- About Zero Sum Game [2] http://en.wiktionary.org/wiki/horses_for_courses -- Horses for Courses. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/ -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
-I cannot denote a timespan that limits the validity of a statement. But you can. Make an IntermediateNode (e.g. Barack_Obama__1) and put what you will there. This is used all the time for CareerPost, the association node between a player and a team, etc. Political positions in many wikis are modeled with a lot of shopistication Position1: title, country/region/city/ from which party, etc Term11: which (1,2,3), from, to Colleague111: title (e.g. vicePresident), from, to Colleague112: title, from, to Term21 Position 2 Term21 Colleague211 The best you can map this to (for someone X) is X careerPost X_1: Postion1(title); Term11(from-to); coleague colleague111, colleague112. X careerPost X_2: Postion1(title) Term12(from-to) X careerPost X_3: Position2; Term 21; colleague colleague211 What you cannot map is point to an IntermediateNode of the colleague, and map the from/to of colleagues. (And you can only map their position's title if you use subprops, e.g. vicePresidentcolleague) But I'm fine with that: the big problem is that Wikipedia temlate params got no arrays, so all these params end up being imaginatively numbered (not a precise numbering system like above :-) -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Well, I got to Allen's algebra of intervals because I was concerned about how to deal with all of the different date time formats that are specified in XSD. All of these can be treated, correctly, as either an interval or a set of intervals. Note there are modelling issues that go beyond this. For instance, I still say we retain the birth date and death date properties even though you could model somebody's life as an interval. There are lots of practical reasons, but one of them is that I know my life is not an open ended interval although it looks like that now. Using this is a practical theory of time I can usually figure out what I need to know. I can say, however, if a person has a birthdate in Freebase of Jan 1, X, odds are far less than 0.5 that the person was born on that day. Thus, if I want to say anything abut people born on Jan 1, X and not look like a fool, I need to go through those facts and figure out which ones I believe. Thus, in some cases the data is really broken and energy must be spent to overcome entropy. On Tue, Jan 27, 2015 at 1:05 PM, M. Aaron Bossert maboss...@gmail.com wrote: Paul, The date ranges are doable...I would say that one can still work either as-is...and working with differing levels of specificity...if you work with the dates as they are... Aaron On Jan 27, 2015, at 12:27, Paul Houle ontolo...@gmail.com wrote: DBpedia has a mission that is focused around extracting data from Wikipedia. Importing data wholesale from Wikidata or something like that seems to be inconsistent with that mission, but there are all kinds of temporal and provenance things that could be teased out of Wikipedia, if not out of the Infoboxes. I think most query scenarios are going to work like this [Pot of data with provenance information] - [Data Set Representing a POV] - query I've been banging my head on the temporal aspect for a while and I am convinced that the practical answer to a lot of problems is to replace times with time intervals. Intervals can be used to model duration and uncertainty and the overloading between those functions is not so bad because usually you know from the context what the interval is being used to represent. There is a lot of pain right now if you want to work with dates from either DBpedia or Freebase because different kinds of dates are specified to different levels of detail. If you make a plot of people's birthdays in Freebase for instance you find a lot of people born on Jan 1 I think because that is something 'plausible' to put in. A birth date could be resolved to a short interval (I know was I born at 4:06 in the afternoon) and astrologers would like to know that, but the frequent use of a calendar day is a statement about imprecision, although defining my birthday as a set of one day intervals the interval is reflecting a social convention. Anyway, there is an algebra over time intervals that is well accepted http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852 and could be implemented either as a native XSD data type or by some structure involving blank nodes. On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com wrote: Martin, When I first started working with RDF, I didn't fully get the full expressivity of it. All of the things you are saying can't be done (perhaps, easily?) are quite simple to implement. When compared to the property graph model, RDF, at first glance, seems inferior, but in reality, is much more expressive, in my opinion. Through reification, you can express all of the concepts that you are wanting to (provenance, date ranges, etc). At the end of the day, RDF's expressivity comes at the cost of verbosity, which, in my opinion is well worth it. If you would like some help in modeling your graph to represent the missing concepts that you are after, I will be happy to help you out with some more specific examples and pointers if it would be helpful to you. Aaron On Jan 27, 2015, at 06:33, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data.
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
On 1/27/15 1:43 PM, Martin Brümmer wrote: I kind of disagree with you here. I regard and use DBpedia as a source of machine-readable linked data first. Because of its nature as derivative project extracting Wikipedia data, it is endangered by a potential future in which the Wikipedia crowd maintains their own machine-readable linked data to feed (among others) info boxes the DBpedia seeks to extract. Martin, DBpedia isn't *endangered*. Publishing content to a global HTTP based Network such as the World Wide Web isn't a zero sum affair. DBpedia's prime goal is to contribute to the Linked Open Data collective within the World Wide Web. To date, DBpedia has over achieved as the core that bootstrapped the Linked Open Data Cloud. Wikidata, Freebase, etc.. are complimentary initiatives. There gains or loses are not in any way affected by DBpedia. The Web is designed on a horses for courses doctrine. We can indeed all get along, and be successful, without anyone having to lose out :) [1] http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FZero-sum_game -- About Zero Sum Game [2] http://en.wiktionary.org/wiki/horses_for_courses -- Horses for Courses. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Paul, The date ranges are doable...I would say that one can still work either as-is...and working with differing levels of specificity...if you work with the dates as they are... Aaron On Jan 27, 2015, at 12:27, Paul Houle ontolo...@gmail.com wrote: DBpedia has a mission that is focused around extracting data from Wikipedia. Importing data wholesale from Wikidata or something like that seems to be inconsistent with that mission, but there are all kinds of temporal and provenance things that could be teased out of Wikipedia, if not out of the Infoboxes. I think most query scenarios are going to work like this [Pot of data with provenance information] - [Data Set Representing a POV] - query I've been banging my head on the temporal aspect for a while and I am convinced that the practical answer to a lot of problems is to replace times with time intervals. Intervals can be used to model duration and uncertainty and the overloading between those functions is not so bad because usually you know from the context what the interval is being used to represent. There is a lot of pain right now if you want to work with dates from either DBpedia or Freebase because different kinds of dates are specified to different levels of detail. If you make a plot of people's birthdays in Freebase for instance you find a lot of people born on Jan 1 I think because that is something 'plausible' to put in. A birth date could be resolved to a short interval (I know was I born at 4:06 in the afternoon) and astrologers would like to know that, but the frequent use of a calendar day is a statement about imprecision, although defining my birthday as a set of one day intervals the interval is reflecting a social convention. Anyway, there is an algebra over time intervals that is well accepted http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852 and could be implemented either as a native XSD data type or by some structure involving blank nodes. On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com wrote: Martin, When I first started working with RDF, I didn't fully get the full expressivity of it. All of the things you are saying can't be done (perhaps, easily?) are quite simple to implement. When compared to the property graph model, RDF, at first glance, seems inferior, but in reality, is much more expressive, in my opinion. Through reification, you can express all of the concepts that you are wanting to (provenance, date ranges, etc). At the end of the day, RDF's expressivity comes at the cost of verbosity, which, in my opinion is well worth it. If you would like some help in modeling your graph to represent the missing concepts that you are after, I will be happy to help you out with some more specific examples and pointers if it would be helpful to you. Aaron On Jan 27, 2015, at 06:33, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? So I propose being proactive about it: I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Hi Magnus, Am 27.01.2015 um 15:12 schrieb Magnus Knuth: Hi Martin, how daring that you started this discussion :D Well, I just felt like stirring up the community a bit so people have something to argue about while sitting in Dublin's beautiful pubs ;) I just want to put my 2 cents in it. I think you are mixing things up. Wikipedia, DBpedia, Wikidata, and Freebase are more or less standalone projects. Some are synced, depending, or partially imported into another. But, there is no need and no use of fully importing Wikidata into DBpedia! Better get an RDF dump of Wikidata. I'm not so sure about that. From a LOD users perspective, the idea of a place that integrates encyclopedic knowledge in a comprehensive way with high quality is very attractive to me. I'm not alone with that, evidenced by DBpedia's central place in the LOD cloud. RDF dumps are not very easy and reliable to handle and most importantly not linked data. The intended import of Freebase data to Wikidata will hardly be complete. One reason is that Freebase has no references of single facts to a particular source, which is a requirement for claims in Wikidata. I.e. unfortunately Freebase will never become imported to Wikidata completely. Freebase has it’s own community of contributors that provide and link facts into the knowledge base. Freebase’s biggest advantage is the easy import of own data. Time will show how this is adapted to Wikidata. Opposite there is DBpedia, which (currently) does not support manipulating A-Box facts. As Alexandru said, DBpedia is about extraction. You might be right that Freebase can not be completely merged into Wikidata and that all projects will coexist in their own niche. However, I believe that even then it is a worthwhile cause to tackle triple level provenance, modelling time constraints and persistence of facts throughout DBpedia versions. It's interesting that you bring up manipulating A-Box facts. If we could address individual triples, making statements about them, including their validity and possibly correcting them indivdually without the change being lost after the next conversion could be possible. One might argue that these changes should be done directly in the Wikipedia, but this sometimes implies bureaucracy with Wikipedia editors that I would like to avoid. regards, Martin Am 27.01.2015 um 13:46 schrieb Alexandru Todor to...@inf.fu-berlin.de: Hi Martin We discussed this issue a bit in the developer hangout, sadly to few people are usually present. On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. I think it's more of a resource/implementation problem for them. Publishing linked data requires a major commitment and the tools for it are more than lacking in refinement. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for structured data while DBpedia is an Information Extraction Framework with a crowdsourced component that is the mappings wiki. While wikidata might gain a lot of data from Freebase, it won't help them that much if Google does not give the Information Extraction framework behind Freebase. It would mean that the data would get old very fast and the community won't be able to update and maintain it. Though What exactly Google will do remains to be seen. So I propose being proactive about it: I agree with being proactive, we have a lot of problems in DBpedia that need to be addressed. Indeed DBpedia community should think about a roadmap for future developments. I see a large problem of the
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
There's the interesting question of, if we were building something like Freebase today based on RDF, what sort of facilities would be built in for 'Wiki' management. That is, you need provenance metadata not so much to say These great population numbers for county X are from the world bank (and if you look closely they linearly interpolate between censuses which could be ten or more years apart) but more to say User Z asserted 70,000 bogus triples On Tue, Jan 27, 2015 at 1:43 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi Alexandru, Am 27.01.2015 um 13:46 schrieb Alexandru Todor: Hi Martin We discussed this issue a bit in the developer hangout, sadly to few people are usually present. On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. I think it's more of a resource/implementation problem for them. Publishing linked data requires a major commitment and the tools for it are more than lacking in refinement. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for structured data while DBpedia is an Information Extraction Framework with a crowdsourced component that is the mappings wiki. While wikidata might gain a lot of data from Freebase, it won't help them that much if Google does not give the Information Extraction framework behind Freebase. It would mean that the data would get old very fast and the community won't be able to update and maintain it. Though What exactly Google will do remains to be seen. I kind of disagree with you here. I regard and use DBpedia as a source of machine-readable linked data first. Because of its nature as derivative project extracting Wikipedia data, it is endangered by a potential future in which the Wikipedia crowd maintains their own machine-readable linked data to feed (among others) info boxes the DBpedia seeks to extract. I fear that, with Freebase becoming a part of Wikidata, this future becomes a little more likely to happen, even if we don't know what Google does, as you rightfully say. So I propose being proactive about it: I agree with being proactive, we have a lot of problems in DBpedia that need to be addressed. I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: DBpedia is not only available in triples but also in N-quads. -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after the next president of the USA was elected, this fact might be missing from the DBpedia and my link becomes moot. -This is a problem with persistency. Being able to download old dumps of DBpedia is not a sufficient model of persistency. The community struggles to increase data quality, but as soon as a new version is published, it drops some of the progress made in favour of whatever facts are found in the Wikipedia dumps at the time of extraction. The old facts should persist, not only in some dump files, but
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Hi Alexandru, Am 27.01.2015 um 13:46 schrieb Alexandru Todor: Hi Martin We discussed this issue a bit in the developer hangout, sadly to few people are usually present. On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de mailto:bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. I think it's more of a resource/implementation problem for them. Publishing linked data requires a major commitment and the tools for it are more than lacking in refinement. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for structured data while DBpedia is an Information Extraction Framework with a crowdsourced component that is the mappings wiki. While wikidata might gain a lot of data from Freebase, it won't help them that much if Google does not give the Information Extraction framework behind Freebase. It would mean that the data would get old very fast and the community won't be able to update and maintain it. Though What exactly Google will do remains to be seen. I kind of disagree with you here. I regard and use DBpedia as a source of machine-readable linked data first. Because of its nature as derivative project extracting Wikipedia data, it is endangered by a potential future in which the Wikipedia crowd maintains their own machine-readable linked data to feed (among others) info boxes the DBpedia seeks to extract. I fear that, with Freebase becoming a part of Wikidata, this future becomes a little more likely to happen, even if we don't know what Google does, as you rightfully say. So I propose being proactive about it: I agree with being proactive, we have a lot of problems in DBpedia that need to be addressed. I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: DBpedia is not only available in triples but also in N-quads. -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after the next president of the USA was elected, this fact might be missing from the DBpedia and my link becomes moot. -This is a problem with persistency. Being able to download old dumps of DBpedia is not a sufficient model of persistency. The community struggles to increase data quality, but as soon as a new version is published, it drops some of the progress made in favour of whatever facts are found in the Wikipedia dumps at the time of extraction. The old facts should persist, not only in some dump files, but as linkable data. Being able to address these problems would also mean being able to fully import Wikidata, including provenance statements and validity timespans, and combine it with the DBpedia ontology (which already is an important focus of development and
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Hi Martin, how daring that you started this discussion :D I just want to put my 2 cents in it. I think you are mixing things up. Wikipedia, DBpedia, Wikidata, and Freebase are more or less standalone projects. Some are synced, depending, or partially imported into another. But, there is no need and no use of fully importing Wikidata into DBpedia! Better get an RDF dump of Wikidata. The intended import of Freebase data to Wikidata will hardly be complete. One reason is that Freebase has no references of single facts to a particular source, which is a requirement for claims in Wikidata. I.e. unfortunately Freebase will never become imported to Wikidata completely. Freebase has it’s own community of contributors that provide and link facts into the knowledge base. Freebase’s biggest advantage is the easy import of own data. Time will show how this is adapted to Wikidata. Opposite there is DBpedia, which (currently) does not support manipulating A-Box facts. As Alexandru said, DBpedia is about extraction. Am 27.01.2015 um 13:46 schrieb Alexandru Todor to...@inf.fu-berlin.de: Hi Martin We discussed this issue a bit in the developer hangout, sadly to few people are usually present. On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. I think it's more of a resource/implementation problem for them. Publishing linked data requires a major commitment and the tools for it are more than lacking in refinement. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for structured data while DBpedia is an Information Extraction Framework with a crowdsourced component that is the mappings wiki. While wikidata might gain a lot of data from Freebase, it won't help them that much if Google does not give the Information Extraction framework behind Freebase. It would mean that the data would get old very fast and the community won't be able to update and maintain it. Though What exactly Google will do remains to be seen. So I propose being proactive about it: I agree with being proactive, we have a lot of problems in DBpedia that need to be addressed. Indeed DBpedia community should think about a roadmap for future developments. I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: DBpedia is not only available in triples but also in N-quads. I do not see any problem with restrictions of the RDF data model as a data exchange framework. But I admit there are some limitations with managing changes and also provenance. However, that is not relevant for most applications that want to work with this data. -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. As Alexandru said, N-quads can be a solution for this. DBpedia extraction framework already supports multiple datasets, at least one for each extraction step. Actually I don’t know whether they are currently delivered or that is behind Virtuoso’s capabilities. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact.
[Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? So I propose being proactive about it: I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after the next president of the USA was elected, this fact might be missing from the DBpedia and my link becomes moot. -This is a problem with persistency. Being able to download old dumps of DBpedia is not a sufficient model of persistency. The community struggles to increase data quality, but as soon as a new version is published, it drops some of the progress made in favour of whatever facts are found in the Wikipedia dumps at the time of extraction. The old facts should persist, not only in some dump files, but as linkable data. Being able to address these problems would also mean being able to fully import Wikidata, including provenance statements and validity timespans, and combine it with the DBpedia ontology (which already is an important focus of development and rightfully so). It also means a persistent DBpedia that does not start over in the next version. So how can it be realized? With reification of course! But most of us resent the problems reification brings with it, the complications in querying etc. The reification model itself is also unclear. There are different proposals, blank nodes, reification vocabulary, graph names, creating unique subproperties for each triple etc. Now I won't propose using one of these models, this will surely be subject to discussion. But the DBpedia can propose a model and the LOD community will adapt, due to DBpedia's state and impact. I think it is time to up the standard of handling provenance and persistence in the LOD cloud and DBpedia should make the start. Especially in the face of Freebase and Wikidata merging, I believe it is imperative for the DBpedia to move forward. regards, Martin [1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Hi Martin We discussed this issue a bit in the developer hangout, sadly to few people are usually present. On Tue, Jan 27, 2015 at 12:33 PM, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. I think it's more of a resource/implementation problem for them. Publishing linked data requires a major commitment and the tools for it are more than lacking in refinement. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? Wikidata and DBpedia are 2 different beasts. Wikidata is a wiki for structured data while DBpedia is an Information Extraction Framework with a crowdsourced component that is the mappings wiki. While wikidata might gain a lot of data from Freebase, it won't help them that much if Google does not give the Information Extraction framework behind Freebase. It would mean that the data would get old very fast and the community won't be able to update and maintain it. Though What exactly Google will do remains to be seen. So I propose being proactive about it: I agree with being proactive, we have a lot of problems in DBpedia that need to be addressed. I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: DBpedia is not only available in triples but also in N-quads. -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after the next president of the USA was elected, this fact might be missing from the DBpedia and my link becomes moot. -This is a problem with persistency. Being able to download old dumps of DBpedia is not a sufficient model of persistency. The community struggles to increase data quality, but as soon as a new version is published, it drops some of the progress made in favour of whatever facts are found in the Wikipedia dumps at the time of extraction. The old facts should persist, not only in some dump files, but as linkable data. Being able to address these problems would also mean being able to fully import Wikidata, including provenance statements and validity timespans, and combine it with the DBpedia ontology (which already is an important focus of development and rightfully so). It also means a persistent DBpedia that does not start over in the next version. So how can it be realized? With reification of course! But most of us resent the problems reification brings with it, the complications in querying etc. The reification model itself is also unclear. There are different proposals, blank nodes, reification vocabulary, graph names, creating unique subproperties for each triple etc. Now I won't propose using one of these models, this will surely be subject to discussion. But the DBpedia can propose a model and the LOD community will adapt, due to DBpedia's state and impact. I think it is time to up the standard of handling provenance and persistence in the LOD cloud and DBpedia should make the start. Especially in the face of Freebase and Wikidata merging, I believe it is imperative for the DBpedia to move forward. The problem of different changes
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
DBpedia has a mission that is focused around extracting data from Wikipedia. Importing data wholesale from Wikidata or something like that seems to be inconsistent with that mission, but there are all kinds of temporal and provenance things that could be teased out of Wikipedia, if not out of the Infoboxes. I think most query scenarios are going to work like this [Pot of data with provenance information] - [Data Set Representing a POV] - query I've been banging my head on the temporal aspect for a while and I am convinced that the practical answer to a lot of problems is to replace times with time intervals. Intervals can be used to model duration and uncertainty and the overloading between those functions is not so bad because usually you know from the context what the interval is being used to represent. There is a lot of pain right now if you want to work with dates from either DBpedia or Freebase because different kinds of dates are specified to different levels of detail. If you make a plot of people's birthdays in Freebase for instance you find a lot of people born on Jan 1 I think because that is something 'plausible' to put in. A birth date could be resolved to a short interval (I know was I born at 4:06 in the afternoon) and astrologers would like to know that, but the frequent use of a calendar day is a statement about imprecision, although defining my birthday as a set of one day intervals the interval is reflecting a social convention. Anyway, there is an algebra over time intervals that is well accepted http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852 and could be implemented either as a native XSD data type or by some structure involving blank nodes. On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert maboss...@gmail.com wrote: Martin, When I first started working with RDF, I didn't fully get the full expressivity of it. All of the things you are saying can't be done (perhaps, easily?) are quite simple to implement. When compared to the property graph model, RDF, at first glance, seems inferior, but in reality, is much more expressive, in my opinion. Through reification, you can express all of the concepts that you are wanting to (provenance, date ranges, etc). At the end of the day, RDF's expressivity comes at the cost of verbosity, which, in my opinion is well worth it. If you would like some help in modeling your graph to represent the missing concepts that you are after, I will be happy to help you out with some more specific examples and pointers if it would be helpful to you. Aaron On Jan 27, 2015, at 06:33, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? So I propose being proactive about it: I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after
Re: [Dbpedia-discussion] Freebase, Wikidata and the future of DBpedia
Martin, When I first started working with RDF, I didn't fully get the full expressivity of it. All of the things you are saying can't be done (perhaps, easily?) are quite simple to implement. When compared to the property graph model, RDF, at first glance, seems inferior, but in reality, is much more expressive, in my opinion. Through reification, you can express all of the concepts that you are wanting to (provenance, date ranges, etc). At the end of the day, RDF's expressivity comes at the cost of verbosity, which, in my opinion is well worth it. If you would like some help in modeling your graph to represent the missing concepts that you are after, I will be happy to help you out with some more specific examples and pointers if it would be helpful to you. Aaron On Jan 27, 2015, at 06:33, Martin Brümmer bruem...@informatik.uni-leipzig.de wrote: Hi DBpedians! As you surely have noticed, Google has abandoned Freebase and it will merge with Wikidata [1]. I searched the list, but did not find a discussion about it. So here goes my point of view: When Wikidata was started, I hoped it would quickly become a major contributor of quality data to the LOD cloud. But although the project has a potentially massive crowd and is backed by Wikimedia, it does not really care about the Linked Data paradigm as established in the Semantic Web. RDF is more of an afterthought than a central concept. It was a bit disappointing to see that Wikidata's impact on the LOD community is lacking because of this. Now Freebase will be integrated into Wikidata as a curated, Google engineering hardened knowledge base not foreign to RDF and Linked Data. How the integration will be realized is not yet clear it seems. One consequence is hopefully, that the LOD cloud grows by a significant amount of quality data. But I wonder what the consequences for the DBpedia project will be? If Wikimedia gets their own knowledge graph, possible curated by their crowd, where is the place for the DBpedia? Can DBpedia stay relevant with all the problems of an open source project, all the difficulties with mapping heterogeneous data in many different languages, the resulting struggle with data quality and consistency and so on? So I propose being proactive about it: I see a large problem of the DBpedia with restrictions of the RDF data model. Triples limit our ability to make statements about statements. I cannot easily address a fact in the DBpedia and annotate it. This means: -I cannot denote the provenance of a statement. I especially cannot denote the source data it comes from. Resource level provenance is not sufficient if further datasets are to be integrated into DBpedia in the future. -I cannot denote a timespan that limits the validity of a statement. Consider the fact that Barack Obama is the president of the USA. This fact was not valid at a point in the past and won't be valid at some point in the future. Now I might link the DBpedia page of Barack Obama for this fact. Now if a DBpedia version is published after the next president of the USA was elected, this fact might be missing from the DBpedia and my link becomes moot. -This is a problem with persistency. Being able to download old dumps of DBpedia is not a sufficient model of persistency. The community struggles to increase data quality, but as soon as a new version is published, it drops some of the progress made in favour of whatever facts are found in the Wikipedia dumps at the time of extraction. The old facts should persist, not only in some dump files, but as linkable data. Being able to address these problems would also mean being able to fully import Wikidata, including provenance statements and validity timespans, and combine it with the DBpedia ontology (which already is an important focus of development and rightfully so). It also means a persistent DBpedia that does not start over in the next version. So how can it be realized? With reification of course! But most of us resent the problems reification brings with it, the complications in querying etc. The reification model itself is also unclear. There are different proposals, blank nodes, reification vocabulary, graph names, creating unique subproperties for each triple etc. Now I won't propose using one of these models, this will surely be subject to discussion. But the DBpedia can propose a model and the LOD community will adapt, due to DBpedia's state and impact. I think it is time to up the standard of handling provenance and persistence in the LOD cloud and DBpedia should make the start. Especially in the face of Freebase and Wikidata merging, I believe it is imperative for the DBpedia to move forward. regards, Martin [1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc -- Dive into the World of Parallel Programming.