On 13 April 2013 12:28, Dimitris Kontokostas <[email protected]> wrote: > Hi Pablo, > > Normally I would agree with you but, under the circumstances it's a little > more complicated. > My main point is that we don't have someone like Jona working full time on > the framework anymore so there is not enough time to do this right until the > next release (1-2 months). > Well, this is my estimation but Jona is the actual expert in the DIEF > internals, so maybe he can make a better estimate on the effort :)
Implementing the refactoring I proposed at [1] would take three days. Maybe two. Maybe one if we're quick and don't encounter problems that I forgot when I wrote that proposal. Maybe I forgot a lot of stuff, so to be very pessimistic, I'd say a week. :-) Once we have that in place, generating data from JSON pages is relatively simple, since the pages are well structured. Cheers, JC [1] https://github.com/dbpedia/extraction-framework/pull/35 > On the other hand, we are lucky enough to have external contributions this > year (like Andrea's) but this is a process that takes much longer and we > cannot guarantee that these contributions will be towards this goal. > > What I would suggest as a transition phase is to create the next DBpedia > release now when Wikipedia data is not affected at all. Then wait a couple > of months to see where this thing actually goes and get better prepared. > > Cheers, > Dimitris > > > On Sat, Apr 13, 2013 at 12:36 PM, Pablo N. Mendes <[email protected]> > wrote: >> >> >> Hi Dimitris, >> >> > Maybe the lookup approach will give us some improvement over our next >> > release ...but in the following release (in 1+ year) everything will be >> > completely different again. >> > Trying to re-parse already structured data will end up in a very >> > complicated design that we might end-up not using at all. >> >> Maybe I misunderstood this, but I was thinking of a very simple design >> here. You (and Jona) can estimate effort much better than me, due to my >> limited knowledge of the DEF internals. >> >> My suggestion was only to smoothen the transition. In a year or so, >> perhaps all of the data will be in WikiData, and we can just drop the markup >> parsing. But until that point, we need a hybrid solution. If I am seeing >> this right, the key-value store approach that was being discussed would >> allow us to bridge the gap between "completely wiki markup" and "completely >> wikidata". >> >> Once we don't need markup parsing anymore, we just make the switch, since >> we'd already have all of the machinery to connect to wikidata anyways (it is >> a requirement for the hybrid approach). >> >> Cheers, >> Pablo >> >> >> >> >> On Sat, Apr 13, 2013 at 10:20 AM, Jona Christopher Sahnwaldt >> <[email protected]> wrote: >>> >>> On 11 April 2013 13:47, Jona Christopher Sahnwaldt <[email protected]> >>> wrote: >>> > All, >>> > >>> > I'd like to approach these decisions a bit more systematically. >>> > >>> > I'll try to list some of the most important open questions that come >>> > to mind regarding the development of DBpedia and Wikidata. I'll also >>> > add my own more or less speculative answers. >>> > >>> > I think we can't make good decisions about our way forward without >>> > clearly stating and answering these questions. We should ask the >>> > Wikidata people. >>> > >>> > @Anja: who should we ask at Wikidata? Just write to wikidata-l? Or is >>> > there a better way? >>> > >>> > >>> > 1. Will the Wikidata properties be messy (like Wikipedia) or clean >>> > (like DBpedia ontology)? >>> > >>> > My bet is that they will be clean. >>> > >>> > 2. When will Wikidata RDF dumps be available? >>> > >>> > I have no idea. Maybe two months, maybe two years. >>> > >>> > 3. When will data be *copied* from Wikipedia infoboxes (or other >>> > sources) to Wikidata? >>> > >>> > They're already starting. For example, >>> > wikidata/enwiki/Catherine_the_Great [1] has a lot of data. >>> > >>> > 4. When will data be *removed* from Wikipedia infoboxes? >>> > >>> > The inclusion syntax like {{#property:father}} doesn't work yet, so >>> > data cannot be removed. No idea when it will start. Maybe two months, >>> > maybe two years. >>> >>> This is starting sooner than I expected: >>> >>> >>> http://meta.wikimedia.org/wiki/Wikidata/Deployment_Questions#When_will_this_be_deployed_on_my_Wikipedia.3F >>> >>> ---- >>> >>> Phase 2 (infoboxes) >>> >>> When will this be deployed on my Wikipedia? >>> >>> It is already deployed on the following Wikipedias: it, he, hu, ru, >>> tr, uk, uz, hr, bs, sr, sh. The deployment on English Wikipedia was >>> planned for April 8 and on all remaining Wikipedias on April 10. This >>> had to be postponed. New dates will be announced here as soon as we >>> know them. >>> >>> ---- >>> >>> Sounds like the inclusion syntax will be enabled on enwiki in the next >>> few weeks. I would guess there are many active users or even bots who >>> will replace data in infobox instances by inclusion calls. This means >>> we will lose data if we don't extend our framework soon. >>> >>> Also see http://blog.wikimedia.de/2013/03/27/you-can-have-all-the-data/ >>> >>> > >>> > 5. What kind of datasets do we want to offer for download? >>> > >>> > I think that we should try to offer more or less the same datasets as >>> > before, which means that we have to merge Wikipedia and Wikidata >>> > extraction results. Even better: offer "pure" Wikipedia datasets >>> > (which will contain only the few inter-language links that remained in >>> > Wikipedia), "pure" Wikidata datasets (all the inter-language links >>> > that were moved, and the little bit of data that was already added) >>> > and "merged" datasets. >>> > >>> > 5. What kind of datasets do we want to load in the main SPARQL >>> > endpoint? >>> > >>> > Probably the "merged" datasets. >>> > >>> > 6. Do we want a new SPARQL endpoint for Wikidata data, for example at >>> > http://data.dbpedia.org/sparql? >>> > >>> > If yes, I guess this endpoint should only contain the "pure" Wikidata >>> > datasets. >>> > >>> > 7. What about the other DBpedia chapters? >>> > >>> > They certainly need the inter-language links, so we should prepare >>> > them. They'll probably also want sameAs links to data.dbpedia.org. >>> > >>> > >>> > So much for now. I'm sure there are many other questions that I forgot >>> > here and different answers. Keep them coming. :-) >>> > >>> > Cheers, >>> > JC >>> > >>> > >>> > [1] >>> > http://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Catherine_the_Great >>> > = http://www.wikidata.org/wiki/Q36450 >>> > >>> > >>> > On 8 April 2013 09:03, Dimitris Kontokostas <[email protected]> wrote: >>> >> Hi Anja, >>> >> >>> >> >>> >> >>> >> On Mon, Apr 8, 2013 at 9:36 AM, Anja Jentzsch <[email protected]> wrote: >>> >>> >>> >>> Hi Dimitris, >>> >>> >>> >>> >>> >>> On Apr 8, 2013, at 8:29, Dimitris Kontokostas <[email protected]> >>> >>> wrote: >>> >>> >>> >>> Hi JC, >>> >>> >>> >>> >>> >>> On Sun, Apr 7, 2013 at 11:55 PM, Jona Christopher Sahnwaldt >>> >>> <[email protected]> wrote: >>> >>>> >>> >>>> Hi Dimitris, >>> >>>> >>> >>>> a lot of important remarks. I think we should discuss this in >>> >>>> detail. >>> >>>> >>> >>>> On 7 April 2013 21:38, Dimitris Kontokostas <[email protected]> >>> >>>> wrote: >>> >>>> > Hi, >>> >>>> > >>> >>>> > I disagree with this approach and I believe that if we use this as >>> >>>> > our >>> >>>> > main >>> >>>> > strategy we will end-up lacking in quality & completeness. >>> >>>> > >>> >>>> > Let's say that we will manage to handle {{#property P123}} or >>> >>>> > {{#property >>> >>>> > property name}} correctly & very efficiently. What will we do for >>> >>>> > templates >>> >>>> > like [1], >>> >>>> >>> >>>> I would think such templates are like many others for which we >>> >>>> programmed special rules in Scala, like unit conversion templates >>> >>>> etc. >>> >>>> We could add special rules for templates that handle Wikidata, too. >>> >>>> Not that I like this approach very much, but it worked (more or >>> >>>> less) >>> >>>> in the past. >>> >>>> >>> >>>> > Lua scripts that use such templates >>> >>>> >>> >>>> For DBpedia, Lua scripts don't really differ from template >>> >>>> definitions. We don't really parse them or use them in any way. If >>> >>>> necessary, we try to reproduce their function in Scala. At least >>> >>>> that's how we dealt with them in the past. Again, not beautiful, but >>> >>>> also not a new problem. >>> >>>> >>> >>>> > or for data in Wikidata that >>> >>>> > are not referenced from Wikipedia at all? >>> >>>> >>> >>>> We would lose that data, that's right. >>> >>> >>> >>> >>> >>> I know that we could achieve all this but it would take too much >>> >>> effort to >>> >>> get this 100% and would come with many bugs at the beggining. >>> >>> My point is that the data are already there and very well structured, >>> >>> why >>> >>> do we need to parse templates & Lua scripts just to get it from >>> >>> Wikidata in >>> >>> the end? >>> >>> >>> >>> >>> >>> There are two ways to integrate Wikidata in Wikipedia: Lua scripts or >>> >>> the >>> >>> inclusion syntax. So it would be neat to cover both. >>> >> >>> >> >>> >> Sure I agree, template rendering is a feature we wanted (and users >>> >> asked) >>> >> for many years. >>> >> We 'll have to implement a MW rendering engine in scala that could be >>> >> useful >>> >> for many-many things but I don't think that Wikidata is the reason we >>> >> should >>> >> built this >>> >> >>> >> I don't know Lua or if this is an allowed syntax but i'd expect >>> >> something >>> >> similar from hard-core wikipedian's sometime soon >>> >> >>> >> for (p in properties) >>> >> if (condition1 && condition2 && condition3) >>> >> load "{{#property p}}" >>> >> >>> >> So we will either miss a lot of data or put too much effort for >>> >> something >>> >> already very well-structured. >>> >> At least at this point where nothing is yet clear. >>> >> >>> >> Cheers, >>> >> Dimitris >>> >> >>> >>> >>> >>> Cheers, >>> >>> Anja >>> >>> >>> >>> >>> >>>> >>> >>>> >>> >>>> > >>> >>>> > Maybe the lookup approach will give us some improvement over our >>> >>>> > next >>> >>>> > release (if we manage to implement it till then). Most of the data >>> >>>> > are >>> >>>> > still >>> >>>> > in wikipedia and Lua scripts & Wikidata templates are not so >>> >>>> > complex >>> >>>> > yet. >>> >>>> > But in the following release (in 1+ year) everything will be >>> >>>> > completely >>> >>>> > different again. The reason is that Wikidata started operations >>> >>>> > exactly >>> >>>> > one >>> >>>> > year ago and partly pushing into production before ~2 months so >>> >>>> > I'd >>> >>>> > expect a >>> >>>> > very big boost in the following months. >>> >>>> >>> >>>> I think so too. >>> >>>> >>> >>>> > My point is that Wikidata is a completely new source and we should >>> >>>> > see >>> >>>> > it as >>> >>>> > such. Trying to re-parse already structured data will end up in a >>> >>>> > very >>> >>>> > complicated design that we might end-up not using at all. >>> >>>> >>> >>>> What do you mean with "re-parse already structured data"? >>> >>>> >>> >>>> > On the other hand Wikidata data although well structured, can >>> >>>> > still be >>> >>>> > compared to our raw infobox extractor (regarding naming variance). >>> >>>> >>> >>>> You mean naming variance of properties? I would expect Wikidata to >>> >>>> be >>> >>>> much better than Wikipedia in this respect. I think that's one of >>> >>>> the >>> >>>> goals of Wikidata: to have one single property for birth date and >>> >>>> use >>> >>>> this property for all types of persons. Apparently, to add a new >>> >>>> Wikidata property, one must go through a community process [1]. >>> >>> >>> >>> >>> >>> I don't have the link but I read that there is no restriction in >>> >>> that. The >>> >>> goal is to provide structured data and the community will need to >>> >>> handle >>> >>> duplicates. >>> >>> This is yet another Wikipedia community so, even if it is a lot >>> >>> strickter >>> >>> I'd expect variations here too. >>> >>> >>> >>>> >>> >>>> >>> >>>> > I suggest >>> >>>> > that we focus on mediating this data to our DBpedia ontology >>> >>>> >>> >>>> This is the really interesting stuff. How could we do this? Will we >>> >>>> let users of the mappings wiki define mappings between Wikidata >>> >>>> properties and DBpedia ontology properties? There are a lot of >>> >>>> possibilities. >>> >>> >>> >>> >>> >>> Yup, many interesting possibilities :) the tricky part will be with >>> >>> the >>> >>> classes but this is a GSoC idea so the students will have to figure >>> >>> this >>> >>> out. >>> >>> I was also thinking of a grease monkey script where Mappers could >>> >>> navigate >>> >>> in Wikidata and see(or even do) the mappings right in Wikidata.org :) >>> >>> >>> >>>> >>> >>>> > and then fusing >>> >>>> > it with data from other DBpedia-language editions. >>> >>>> >>> >>>> Do you mean merging data that's already on Wikidata with stuff >>> >>>> that's >>> >>>> still in Wikipedia pages? >>> >>> >>> >>> >>> >>> The simplest thing we could do is the following: >>> >>> lets say Q1 is a wikidata item linking to article W1 and Wikidata >>> >>> property >>> >>> P1 is mapped to dbpedia-owl:birthDate >>> >>> for Q1 P1 "1/1/2000" we could assume W1 birthDate "1/1/2000" and load >>> >>> the >>> >>> second in dbpedia.org. >>> >>> Even without inteligence at all this could give very good results. >>> >>> >>> >>> Cheers, >>> >>> Dimitris >>> >>> >>> >>>> >>> >>>> >>> >>>> So much for my specific questions. >>> >>>> >>> >>>> The most important question is: where do we expect Wikidata (and >>> >>>> DBpedia) to be in one, two, three years? >>> >>>> >>> >>>> Cheers, >>> >>>> JC >>> >>>> >>> >>>> [1] http://www.wikidata.org/wiki/Wikidata:Property_proposal >>> >>>> >>> >>>> > >>> >>>> > Best, >>> >>>> > Dimitris >>> >>>> > >>> >>>> > [1] http://it.wikipedia.org/wiki/Template:Wikidata >>> >>>> > >>> >>>> > >>> >>>> > On Sun, Apr 7, 2013 at 3:36 AM, Jona Christopher Sahnwaldt >>> >>>> > <[email protected]> >>> >>>> > wrote: >>> >>>> >> >>> >>>> >> When I hear "database", I think "network", which is of course >>> >>>> >> several >>> >>>> >> orders of magnitude slower than a simple map access, but MapDB >>> >>>> >> looks >>> >>>> >> really cool. No network calls, just method calls. Nice! >>> >>>> >> >>> >>>> >> On 7 April 2013 01:10, Pablo N. Mendes <[email protected]> >>> >>>> >> wrote: >>> >>>> >> > >>> >>>> >> > My point was rather that there are implementations out there >>> >>>> >> > that >>> >>>> >> > support >>> >>>> >> > both in-memory and in disk. So there is no need to go between a >>> >>>> >> > map >>> >>>> >> > and >>> >>>> >> > a >>> >>>> >> > database, because you can also access a database as via a map >>> >>>> >> > interface. >>> >>>> >> > http://www.kotek.net/blog/3G_map >>> >>>> >> > >>> >>>> >> > JDBM seems to be good both for speed and memory. >>> >>>> >> > >>> >>>> >> > Cheers, >>> >>>> >> > Pablo >>> >>>> >> > >>> >>>> >> > >>> >>>> >> > On Sat, Apr 6, 2013 at 10:41 PM, Jona Christopher Sahnwaldt >>> >>>> >> > <[email protected]> wrote: >>> >>>> >> >> >>> >>>> >> >> On 6 April 2013 15:34, Mohamed Morsey >>> >>>> >> >> <[email protected]> >>> >>>> >> >> wrote: >>> >>>> >> >> > Hi Pablo. Jona, and all, >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > On 04/06/2013 01:56 PM, Pablo N. Mendes wrote: >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > I'd say this topic can safely move out of dbpedia-discussion >>> >>>> >> >> > and >>> >>>> >> >> > to >>> >>>> >> >> > dbpedia-developers now. :) >>> >>>> >> >> > >>> >>>> >> >> > I agree with Jona. With one small detail: perhaps it is >>> >>>> >> >> > better we >>> >>>> >> >> > don't >>> >>>> >> >> > to >>> >>>> >> >> > load everything in memory, if we use a fast database such as >>> >>>> >> >> > Berkeley >>> >>>> >> >> > DB >>> >>>> >> >> > or >>> >>>> >> >> > JDBM3. They would also allow you to use in-memory when you >>> >>>> >> >> > can >>> >>>> >> >> > splunge >>> >>>> >> >> > or >>> >>>> >> >> > use disk-backed when restricted. What do you think? >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > I agree with Pablo's idea, as it will work in both dump and >>> >>>> >> >> > live >>> >>>> >> >> > modes. >>> >>>> >> >> > Actually, for live extraction we already need a lot of >>> >>>> >> >> > memory, as >>> >>>> >> >> > we >>> >>>> >> >> > have a >>> >>>> >> >> > running Virtuoso instance that should be updated by the >>> >>>> >> >> > framework, >>> >>>> >> >> > and >>> >>>> >> >> > we >>> >>>> >> >> > have a local mirror of Wikipedia as which used MySQL as >>> >>>> >> >> > back-end >>> >>>> >> >> > storage. >>> >>>> >> >> > So, I would prefer saving as much memory as possible. >>> >>>> >> >> >>> >>>> >> >> Let's make it pluggable and configurable then. If you're more >>> >>>> >> >> concerned with speed than memory (as in the dump extraction), >>> >>>> >> >> use a >>> >>>> >> >> map. If it's the other way round, use some kind of database. >>> >>>> >> >> >>> >>>> >> >> I expect the interface to be very simple: for Wikidata item X >>> >>>> >> >> give >>> >>>> >> >> me >>> >>>> >> >> the value of property Y. >>> >>>> >> >> >>> >>>> >> >> The only problem I see is that we currently have no usable >>> >>>> >> >> configuration in DBpedia. At least for the dump extraction - I >>> >>>> >> >> don't >>> >>>> >> >> know about the live extraction. The dump extraction >>> >>>> >> >> configuration >>> >>>> >> >> consists of flat files and static fields in some classes, >>> >>>> >> >> which is >>> >>>> >> >> pretty awful and would make it rather hard to exchange one >>> >>>> >> >> implementation of this WikidataQuery interface for another. >>> >>>> >> >> >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > Cheers, >>> >>>> >> >> > Pablo >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher Sahnwaldt >>> >>>> >> >> > <[email protected]> wrote: >>> >>>> >> >> >> >>> >>>> >> >> >> On 5 April 2013 21:27, Andrea Di Menna <[email protected]> >>> >>>> >> >> >> wrote: >>> >>>> >> >> >> > Hi Dimitris, >>> >>>> >> >> >> > >>> >>>> >> >> >> > I am not completely getting your point. >>> >>>> >> >> >> > >>> >>>> >> >> >> > How would you handle the following example? (supposing >>> >>>> >> >> >> > the >>> >>>> >> >> >> > following >>> >>>> >> >> >> > will be >>> >>>> >> >> >> > possible with Wikipedia/Wikidata) >>> >>>> >> >> >> > >>> >>>> >> >> >> > Suppose you have >>> >>>> >> >> >> > >>> >>>> >> >> >> > {{Infobox:Test >>> >>>> >> >> >> > | name = {{#property:p45}} >>> >>>> >> >> >> > }} >>> >>>> >> >> >> > >>> >>>> >> >> >> > and a mapping >>> >>>> >> >> >> > >>> >>>> >> >> >> > {{PropertyMapping | templateProperty = name | >>> >>>> >> >> >> > ontologyProperty >>> >>>> >> >> >> > = >>> >>>> >> >> >> > foaf:name}} >>> >>>> >> >> >> > >>> >>>> >> >> >> > what would happen when running the MappingExtractor? >>> >>>> >> >> >> > Which RDF triples would be generated? >>> >>>> >> >> >> >>> >>>> >> >> >> I think there are two questions here, and two very >>> >>>> >> >> >> different >>> >>>> >> >> >> approaches. >>> >>>> >> >> >> >>> >>>> >> >> >> 1. In the near term, I would expect that Wikipedia >>> >>>> >> >> >> templates are >>> >>>> >> >> >> modified like in your example. >>> >>>> >> >> >> >>> >>>> >> >> >> How could/should DBpedia deal with this? The simplest >>> >>>> >> >> >> solution >>> >>>> >> >> >> seems >>> >>>> >> >> >> to be that during a preliminary step, we extract data from >>> >>>> >> >> >> Wikidata >>> >>>> >> >> >> and store it. During the main extraction, whenever we find >>> >>>> >> >> >> a >>> >>>> >> >> >> reference >>> >>>> >> >> >> to Wikidata, we look it up and generate a triple as usual. >>> >>>> >> >> >> Not a >>> >>>> >> >> >> huge >>> >>>> >> >> >> change. >>> >>>> >> >> >> >>> >>>> >> >> >> 2. In the long run though, when all data is moved to >>> >>>> >> >> >> Wikidata, >>> >>>> >> >> >> all >>> >>>> >> >> >> instances of a certain infobox type will look the same. It >>> >>>> >> >> >> doesn't >>> >>>> >> >> >> matter anymore if an infobox is about Germany or Italy, >>> >>>> >> >> >> because >>> >>>> >> >> >> they >>> >>>> >> >> >> all use the same properties: >>> >>>> >> >> >> >>> >>>> >> >> >> {{Infobox country >>> >>>> >> >> >> | capitol = {{#property:p45}} >>> >>>> >> >> >> | population = {{#property:p42}} >>> >>>> >> >> >> ... etc. ... >>> >>>> >> >> >> }} >>> >>>> >> >> >> >>> >>>> >> >> >> I guess Wikidata already thought of this and has plans to >>> >>>> >> >> >> then >>> >>>> >> >> >> replace >>> >>>> >> >> >> the whole infobox by a small construct that simply >>> >>>> >> >> >> instructs >>> >>>> >> >> >> MediaWiki >>> >>>> >> >> >> to pull all data for this item from Wikidata and display an >>> >>>> >> >> >> infobox. >>> >>>> >> >> >> In this case, there will be nothing left to extract for >>> >>>> >> >> >> DBpedia. >>> >>>> >> >> >> >>> >>>> >> >> >> Implementation detail: we shouldn't use a SPARQL store to >>> >>>> >> >> >> look >>> >>>> >> >> >> up >>> >>>> >> >> >> Wikidata data, we should keep them in memory. A SPARQL call >>> >>>> >> >> >> will >>> >>>> >> >> >> certainly be at least 100 times slower than a lookup in a >>> >>>> >> >> >> map, >>> >>>> >> >> >> but >>> >>>> >> >> >> probably 10000 times or more. This matters because there >>> >>>> >> >> >> will be >>> >>>> >> >> >> hundreds of millions of lookup calls during an extraction. >>> >>>> >> >> >> Keeping >>> >>>> >> >> >> all >>> >>>> >> >> >> inter-language links in memory takes about 4 or 5 GB - not >>> >>>> >> >> >> much. >>> >>>> >> >> >> Of >>> >>>> >> >> >> course, keeping all Wikidata data in memory would take >>> >>>> >> >> >> between >>> >>>> >> >> >> 10 >>> >>>> >> >> >> and >>> >>>> >> >> >> 100 times as much RAM. >>> >>>> >> >> >> >>> >>>> >> >> >> Cheers, >>> >>>> >> >> >> JC >>> >>>> >> >> >> >>> >>>> >> >> >> > >>> >>>> >> >> >> > Cheers >>> >>>> >> >> >> > Andrea >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > 2013/4/5 Dimitris Kontokostas <[email protected]> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> Hi, >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> For me there is no reason to complicate the DBpedia >>> >>>> >> >> >> >> framework >>> >>>> >> >> >> >> by >>> >>>> >> >> >> >> resolving >>> >>>> >> >> >> >> Wikidata data / templates. >>> >>>> >> >> >> >> What we could do is (try to) provide a semantic mirror >>> >>>> >> >> >> >> of >>> >>>> >> >> >> >> Wikidata >>> >>>> >> >> >> >> in >>> >>>> >> >> >> >> i.e. >>> >>>> >> >> >> >> data.dbpedia.org. We should simplify it by mapping the >>> >>>> >> >> >> >> data >>> >>>> >> >> >> >> to >>> >>>> >> >> >> >> the >>> >>>> >> >> >> >> DBpedia >>> >>>> >> >> >> >> ontology and then use it like any other language edition >>> >>>> >> >> >> >> we >>> >>>> >> >> >> >> have >>> >>>> >> >> >> >> (e.g. >>> >>>> >> >> >> >> nl.dbpedia.org). >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> In dbpedia.org we already aggregate data from other >>> >>>> >> >> >> >> language >>> >>>> >> >> >> >> editions. >>> >>>> >> >> >> >> For >>> >>>> >> >> >> >> now it is mostly labels & abstracts but we can also fuse >>> >>>> >> >> >> >> Wikidata >>> >>>> >> >> >> >> data. >>> >>>> >> >> >> >> This >>> >>>> >> >> >> >> way, whatever is missing from the Wikipedia dumps will >>> >>>> >> >> >> >> be >>> >>>> >> >> >> >> filled >>> >>>> >> >> >> >> in >>> >>>> >> >> >> >> the >>> >>>> >> >> >> >> end >>> >>>> >> >> >> >> by the Wikidata dumps >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> Best, >>> >>>> >> >> >> >> Dimitris >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu >>> >>>> >> >> >> >> <[email protected]> wrote: >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> Ok, thanks for the precision :-) It's perfect, now just >>> >>>> >> >> >> >>> waiting >>> >>>> >> >> >> >>> when >>> >>>> >> >> >> >>> the >>> >>>> >> >> >> >>> dump of these data will be available. >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> Best. >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> Julien Plu. >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> 2013/4/5 Jona Christopher Sahnwaldt <[email protected]> >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> On 5 April 2013 19:59, Julien Plu >>> >>>> >> >> >> >>>> <[email protected]> >>> >>>> >> >> >> >>>> wrote: >>> >>>> >> >> >> >>>> > Hi, >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > @Anja : Have you a post from a blog or something >>> >>>> >> >> >> >>>> > like >>> >>>> >> >> >> >>>> > that >>> >>>> >> >> >> >>>> > which >>> >>>> >> >> >> >>>> > speaking >>> >>>> >> >> >> >>>> > about RDF dump of wikidata ? >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> @Anja: do you know when RDF dumps are planned to be >>> >>>> >> >> >> >>>> available? >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> > The french wikidata will also provide their >>> >>>> >> >> >> >>>> > data in RDF ? >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> There is only one Wikidata - neither English nor >>> >>>> >> >> >> >>>> French nor >>> >>>> >> >> >> >>>> any >>> >>>> >> >> >> >>>> other >>> >>>> >> >> >> >>>> language. It's just data. There are labels in >>> >>>> >> >> >> >>>> different >>> >>>> >> >> >> >>>> languages, >>> >>>> >> >> >> >>>> but >>> >>>> >> >> >> >>>> the data itself is language-agnostic. >>> >>>> >> >> >> >>>> >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > This news interest me very highly. >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > Best >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > Julien Plu. >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > 2013/4/5 Tom Morris <[email protected]> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher >>> >>>> >> >> >> >>>> >> Sahnwaldt >>> >>>> >> >> >> >>>> >> <[email protected]> wrote: >>> >>>> >> >> >> >>>> >>> >>> >>>> >> >> >> >>>> >>> >>> >>>> >> >> >> >>>> >>> thanks for the heads-up! >>> >>>> >> >> >> >>>> >>> >>> >>>> >> >> >> >>>> >>> On 5 April 2013 10:44, Julien Plu >>> >>>> >> >> >> >>>> >>> <[email protected]> >>> >>>> >> >> >> >>>> >>> wrote: >>> >>>> >> >> >> >>>> >>> > Hi, >>> >>>> >> >> >> >>>> >>> > >>> >>>> >> >> >> >>>> >>> > I saw few days ago that MediaWiki since one >>> >>>> >> >> >> >>>> >>> > month >>> >>>> >> >> >> >>>> >>> > allow >>> >>>> >> >> >> >>>> >>> > to >>> >>>> >> >> >> >>>> >>> > create >>> >>>> >> >> >> >>>> >>> > infoboxes >>> >>>> >> >> >> >>>> >>> > (or part of them) with Lua scripting language. >>> >>>> >> >> >> >>>> >>> > http://www.mediawiki.org/wiki/Lua_scripting >>> >>>> >> >> >> >>>> >>> > >>> >>>> >> >> >> >>>> >>> > So my question is, if every data in the >>> >>>> >> >> >> >>>> >>> > wikipedia >>> >>>> >> >> >> >>>> >>> > infoboxes >>> >>>> >> >> >> >>>> >>> > are >>> >>>> >> >> >> >>>> >>> > in >>> >>>> >> >> >> >>>> >>> > Lua >>> >>>> >> >> >> >>>> >>> > scripts, DBPedia will still be able to retrieve >>> >>>> >> >> >> >>>> >>> > all >>> >>>> >> >> >> >>>> >>> > the >>> >>>> >> >> >> >>>> >>> > data >>> >>>> >> >> >> >>>> >>> > as >>> >>>> >> >> >> >>>> >>> > usual ? >>> >>>> >> >> >> >>>> >>> >>> >>>> >> >> >> >>>> >>> I'm not 100% sure, and we should look into this, >>> >>>> >> >> >> >>>> >>> but I >>> >>>> >> >> >> >>>> >>> think >>> >>>> >> >> >> >>>> >>> that >>> >>>> >> >> >> >>>> >>> Lua >>> >>>> >> >> >> >>>> >>> is only used in template definitions, not in >>> >>>> >> >> >> >>>> >>> template >>> >>>> >> >> >> >>>> >>> calls >>> >>>> >> >> >> >>>> >>> or >>> >>>> >> >> >> >>>> >>> other >>> >>>> >> >> >> >>>> >>> places in content pages. DBpedia does not parse >>> >>>> >> >> >> >>>> >>> template >>> >>>> >> >> >> >>>> >>> definitions, >>> >>>> >> >> >> >>>> >>> only content pages. The content pages probably >>> >>>> >> >> >> >>>> >>> will >>> >>>> >> >> >> >>>> >>> only >>> >>>> >> >> >> >>>> >>> change >>> >>>> >> >> >> >>>> >>> in >>> >>>> >> >> >> >>>> >>> minor ways, if at all. For example, {{Foo}} might >>> >>>> >> >> >> >>>> >>> change to >>> >>>> >> >> >> >>>> >>> {{#invoke:Foo}}. But that's just my preliminary >>> >>>> >> >> >> >>>> >>> understanding >>> >>>> >> >> >> >>>> >>> after >>> >>>> >> >> >> >>>> >>> looking through a few tuorial pages. >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> As far as I can see, the template calls are >>> >>>> >> >> >> >>>> >> unchanged >>> >>>> >> >> >> >>>> >> for >>> >>>> >> >> >> >>>> >> all >>> >>>> >> >> >> >>>> >> the >>> >>>> >> >> >> >>>> >> templates which makes sense when you consider that >>> >>>> >> >> >> >>>> >> some >>> >>>> >> >> >> >>>> >> of >>> >>>> >> >> >> >>>> >> the >>> >>>> >> >> >> >>>> >> templates >>> >>>> >> >> >> >>>> >> that they've upgraded to use Lua like >>> >>>> >> >> >> >>>> >> Template:Coord >>> >>>> >> >> >> >>>> >> are >>> >>>> >> >> >> >>>> >> used >>> >>>> >> >> >> >>>> >> on >>> >>>> >> >> >> >>>> >> almost a >>> >>>> >> >> >> >>>> >> million pages. >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> Here are the ones which have been updated so far: >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> https://en.wikipedia.org/wiki/Category:Lua-based_templates >>> >>>> >> >> >> >>>> >> Performance improvement looks impressive: >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance >>> >>>> >> >> >> >>>> >> >>> >>>> >> >> >> >>>> >> Tom >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>>> > >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> ------------------------------------------------------------------------------ >>> >>>> >> >> >> >>> Minimize network downtime and maximize team >>> >>>> >> >> >> >>> effectiveness. >>> >>>> >> >> >> >>> Reduce network management and security costs.Learn how >>> >>>> >> >> >> >>> to >>> >>>> >> >> >> >>> hire >>> >>>> >> >> >> >>> the most talented Cisco Certified professionals. Visit >>> >>>> >> >> >> >>> the >>> >>>> >> >> >> >>> Employer Resources Portal >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> >> >> >>> _______________________________________________ >>> >>>> >> >> >> >>> Dbpedia-discussion mailing list >>> >>>> >> >> >> >>> [email protected] >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>>> >> >> >> >>> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> -- >>> >>>> >> >> >> >> Kontokostas Dimitris >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> ------------------------------------------------------------------------------ >>> >>>> >> >> >> >> Minimize network downtime and maximize team >>> >>>> >> >> >> >> effectiveness. >>> >>>> >> >> >> >> Reduce network management and security costs.Learn how >>> >>>> >> >> >> >> to >>> >>>> >> >> >> >> hire >>> >>>> >> >> >> >> the most talented Cisco Certified professionals. Visit >>> >>>> >> >> >> >> the >>> >>>> >> >> >> >> Employer Resources Portal >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> >> >> >> _______________________________________________ >>> >>>> >> >> >> >> Dbpedia-discussion mailing list >>> >>>> >> >> >> >> [email protected] >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> >>> >>>> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>>> >> >> >> >> >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > ------------------------------------------------------------------------------ >>> >>>> >> >> >> > Minimize network downtime and maximize team >>> >>>> >> >> >> > effectiveness. >>> >>>> >> >> >> > Reduce network management and security costs.Learn how to >>> >>>> >> >> >> > hire >>> >>>> >> >> >> > the most talented Cisco Certified professionals. Visit >>> >>>> >> >> >> > the >>> >>>> >> >> >> > Employer Resources Portal >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> >> >> > _______________________________________________ >>> >>>> >> >> >> > Dbpedia-discussion mailing list >>> >>>> >> >> >> > [email protected] >>> >>>> >> >> >> > >>> >>>> >> >> >> > >>> >>>> >> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>>> >> >> >> > >>> >>>> >> >> >> >>> >>>> >> >> >> >>> >>>> >> >> >> >>> >>>> >> >> >> >>> >>>> >> >> >> >>> >>>> >> >> >> >>> >>>> >> >> >> ------------------------------------------------------------------------------ >>> >>>> >> >> >> Minimize network downtime and maximize team effectiveness. >>> >>>> >> >> >> Reduce network management and security costs.Learn how to >>> >>>> >> >> >> hire >>> >>>> >> >> >> the most talented Cisco Certified professionals. Visit the >>> >>>> >> >> >> Employer Resources Portal >>> >>>> >> >> >> >>> >>>> >> >> >> http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> >> >> _______________________________________________ >>> >>>> >> >> >> Dbpedia-discussion mailing list >>> >>>> >> >> >> [email protected] >>> >>>> >> >> >> >>> >>>> >> >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > -- >>> >>>> >> >> > >>> >>>> >> >> > Pablo N. Mendes >>> >>>> >> >> > http://pablomendes.com >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > ------------------------------------------------------------------------------ >>> >>>> >> >> > Minimize network downtime and maximize team effectiveness. >>> >>>> >> >> > Reduce network management and security costs.Learn how to >>> >>>> >> >> > hire >>> >>>> >> >> > the most talented Cisco Certified professionals. Visit the >>> >>>> >> >> > Employer Resources Portal >>> >>>> >> >> > >>> >>>> >> >> > http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > _______________________________________________ >>> >>>> >> >> > Dbpedia-discussion mailing list >>> >>>> >> >> > [email protected] >>> >>>> >> >> > >>> >>>> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > >>> >>>> >> >> > -- >>> >>>> >> >> > Kind Regards >>> >>>> >> >> > Mohamed Morsey >>> >>>> >> >> > Department of Computer Science >>> >>>> >> >> > University of Leipzig >>> >>>> >> > >>> >>>> >> > >>> >>>> >> > >>> >>>> >> > >>> >>>> >> > -- >>> >>>> >> > >>> >>>> >> > Pablo N. Mendes >>> >>>> >> > http://pablomendes.com >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> ------------------------------------------------------------------------------ >>> >>>> >> Minimize network downtime and maximize team effectiveness. >>> >>>> >> Reduce network management and security costs.Learn how to hire >>> >>>> >> the most talented Cisco Certified professionals. Visit the >>> >>>> >> Employer Resources Portal >>> >>>> >> http://www.cisco.com/web/learning/employer_resources/index.html >>> >>>> >> _______________________________________________ >>> >>>> >> Dbpedia-developers mailing list >>> >>>> >> [email protected] >>> >>>> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > -- >>> >>>> > Kontokostas Dimitris >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Kontokostas Dimitris >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Minimize network downtime and maximize team effectiveness. >>> >>> Reduce network management and security costs.Learn how to hire >>> >>> the most talented Cisco Certified professionals. Visit the >>> >>> Employer Resources Portal >>> >>> http://www.cisco.com/web/learning/employer_resources/index.html >>> >>> >>> >>> _______________________________________________ >>> >>> Dbpedia-developers mailing list >>> >>> [email protected] >>> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Kontokostas Dimitris >>> >>> >>> ------------------------------------------------------------------------------ >>> Precog is a next-generation analytics platform capable of advanced >>> analytics on semi-structured data. The platform includes APIs for >>> building >>> apps and a phenomenal toolset for data science. Developers can use >>> our toolset for easy data analysis & visualization. Get a free account! >>> http://www2.precog.com/precogplatform/slashdotnewsletter >>> >>> _______________________________________________ >>> Dbpedia-developers mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers >> >> >> >> >> -- >> >> Pablo N. Mendes >> http://pablomendes.com > > > > > -- > Kontokostas Dimitris ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
