On 6 April 2013 15:34, Mohamed Morsey <[email protected]> wrote:
> Hi Pablo. Jona, and all,
>
>
> On 04/06/2013 01:56 PM, Pablo N. Mendes wrote:
>
>
> I'd say this topic can safely move out of dbpedia-discussion and to
> dbpedia-developers now. :)
>
> I agree with Jona. With one small detail: perhaps it is better we don't to
> load everything in memory, if we use a fast database such as Berkeley DB or
> JDBM3. They would also allow you to use in-memory when you can splunge or
> use disk-backed when restricted. What do you think?
>
>
> I agree with Pablo's idea, as it will work in both dump and live modes.
> Actually, for live extraction we already need a lot of memory, as we have a
> running Virtuoso instance that should be updated by the framework, and we
> have a local mirror of Wikipedia as which used MySQL as back-end storage.
> So, I would prefer saving as much memory as possible.

Let's make it pluggable and configurable then. If you're more
concerned with speed than memory (as in the dump extraction), use a
map. If it's the other way round, use some kind of database.

I expect the interface to be very simple: for Wikidata item X give me
the value of property Y.

The only problem I see is that we currently have no usable
configuration in DBpedia. At least for the dump extraction - I don't
know about the live extraction. The dump extraction configuration
consists of flat files and static fields in some classes, which is
pretty awful and would make it rather hard to exchange one
implementation of this WikidataQuery interface for another.

>
>
>
> Cheers,
> Pablo
>
>
> On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher Sahnwaldt
> <[email protected]> wrote:
>>
>> On 5 April 2013 21:27, Andrea Di Menna <[email protected]> wrote:
>> > Hi Dimitris,
>> >
>> > I am not completely getting your point.
>> >
>> > How would you handle the following example? (supposing the following
>> > will be
>> > possible with Wikipedia/Wikidata)
>> >
>> > Suppose you have
>> >
>> > {{Infobox:Test
>> > | name = {{#property:p45}}
>> > }}
>> >
>> > and a mapping
>> >
>> > {{PropertyMapping | templateProperty = name | ontologyProperty =
>> > foaf:name}}
>> >
>> > what would happen when running the MappingExtractor?
>> > Which RDF triples would be generated?
>>
>> I think there are two questions here, and two very different approaches.
>>
>> 1. In the near term, I would expect that Wikipedia templates are
>> modified like in your example.
>>
>> How could/should DBpedia deal with this? The simplest solution seems
>> to be that during a preliminary step, we extract data from Wikidata
>> and store it. During the main extraction, whenever we find a reference
>> to Wikidata, we look it up and generate a triple as usual. Not a huge
>> change.
>>
>> 2. In the long run though, when all data is moved to Wikidata, all
>> instances of a certain infobox type will look the same. It doesn't
>> matter anymore if an infobox is about Germany or Italy, because they
>> all use the same properties:
>>
>> {{Infobox country
>> | capitol = {{#property:p45}}
>> | population = {{#property:p42}}
>> ... etc. ...
>> }}
>>
>> I guess Wikidata already thought of this and has plans to then replace
>> the whole infobox by a small construct that simply instructs MediaWiki
>> to pull all data for this item from Wikidata and display an infobox.
>> In this case, there will be nothing left to extract for DBpedia.
>>
>> Implementation detail: we shouldn't use a SPARQL store to look up
>> Wikidata data, we should keep them in memory. A SPARQL call will
>> certainly be at least 100 times slower than a lookup in a map, but
>> probably 10000 times or more. This matters because there will be
>> hundreds of millions of lookup calls during an extraction. Keeping all
>> inter-language links in memory takes about 4 or 5 GB - not much. Of
>> course, keeping all Wikidata data in memory would take between 10 and
>> 100 times as much RAM.
>>
>> Cheers,
>> JC
>>
>> >
>> > Cheers
>> > Andrea
>> >
>> >
>> > 2013/4/5 Dimitris Kontokostas <[email protected]>
>> >>
>> >> Hi,
>> >>
>> >> For me there is no reason to complicate the DBpedia framework by
>> >> resolving
>> >> Wikidata data / templates.
>> >> What we could do is (try to) provide a semantic mirror of Wikidata in
>> >> i.e.
>> >> data.dbpedia.org. We should simplify it by mapping the data to the
>> >> DBpedia
>> >> ontology and then use it like any other language edition we have (e.g.
>> >> nl.dbpedia.org).
>> >>
>> >> In dbpedia.org we already aggregate data from other language editions.
>> >> For
>> >> now it is mostly labels & abstracts but we can also fuse Wikidata data.
>> >> This
>> >> way, whatever is missing from the Wikipedia dumps will be filled in the
>> >> end
>> >> by the Wikidata dumps
>> >>
>> >> Best,
>> >> Dimitris
>> >>
>> >>
>> >> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu
>> >> <[email protected]> wrote:
>> >>>
>> >>> Ok, thanks for the precision :-) It's perfect, now just waiting when
>> >>> the
>> >>> dump of these data will be available.
>> >>>
>> >>> Best.
>> >>>
>> >>> Julien Plu.
>> >>>
>> >>>
>> >>> 2013/4/5 Jona Christopher Sahnwaldt <[email protected]>
>> >>>>
>> >>>> On 5 April 2013 19:59, Julien Plu
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>> > Hi,
>> >>>> >
>> >>>> > @Anja : Have you a post from a blog or something like that which
>> >>>> > speaking
>> >>>> > about RDF dump of wikidata ?
>> >>>>
>> >>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
>> >>>>
>> >>>> @Anja: do you know when RDF dumps are planned to be available?
>> >>>>
>> >>>> > The french wikidata will also provide their
>> >>>> > data in RDF ?
>> >>>>
>> >>>> There is only one Wikidata - neither English nor French nor any other
>> >>>> language. It's just data. There are labels in different languages,
>> >>>> but
>> >>>> the data itself is language-agnostic.
>> >>>>
>> >>>> >
>> >>>> > This news interest me very highly.
>> >>>> >
>> >>>> > Best
>> >>>> >
>> >>>> > Julien Plu.
>> >>>> >
>> >>>> >
>> >>>> > 2013/4/5 Tom Morris <[email protected]>
>> >>>> >>
>> >>>> >> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher Sahnwaldt
>> >>>> >> <[email protected]> wrote:
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> thanks for the heads-up!
>> >>>> >>>
>> >>>> >>> On 5 April 2013 10:44, Julien Plu
>> >>>> >>> <[email protected]>
>> >>>> >>> wrote:
>> >>>> >>> > Hi,
>> >>>> >>> >
>> >>>> >>> > I saw few days ago that MediaWiki since one month allow to
>> >>>> >>> > create
>> >>>> >>> > infoboxes
>> >>>> >>> > (or part of them) with Lua scripting language.
>> >>>> >>> > http://www.mediawiki.org/wiki/Lua_scripting
>> >>>> >>> >
>> >>>> >>> > So my question is, if every data in the wikipedia infoboxes are
>> >>>> >>> > in
>> >>>> >>> > Lua
>> >>>> >>> > scripts, DBPedia will still be able to retrieve all the data as
>> >>>> >>> > usual ?
>> >>>> >>>
>> >>>> >>> I'm not 100% sure, and we should look into this, but I think that
>> >>>> >>> Lua
>> >>>> >>> is only used in template definitions, not in template calls or
>> >>>> >>> other
>> >>>> >>> places in content pages. DBpedia does not parse template
>> >>>> >>> definitions,
>> >>>> >>> only content pages. The content pages probably will only change
>> >>>> >>> in
>> >>>> >>> minor ways, if at all. For example, {{Foo}} might change to
>> >>>> >>> {{#invoke:Foo}}. But that's just my preliminary understanding
>> >>>> >>> after
>> >>>> >>> looking through a few tuorial pages.
>> >>>> >>
>> >>>> >>
>> >>>> >> As far as I can see, the template calls are unchanged for all the
>> >>>> >> templates which makes sense when you consider that some of the
>> >>>> >> templates
>> >>>> >> that they've upgraded to use Lua like Template:Coord  are used on
>> >>>> >> almost a
>> >>>> >> million pages.
>> >>>> >>
>> >>>> >> Here are the ones which have been updated so far:
>> >>>> >> https://en.wikipedia.org/wiki/Category:Lua-based_templates
>> >>>> >> Performance improvement looks impressive:
>> >>>> >> https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
>> >>>> >>
>> >>>> >> Tom
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Minimize network downtime and maximize team effectiveness.
>> >>> Reduce network management and security costs.Learn how to hire
>> >>> the most talented Cisco Certified professionals. Visit the
>> >>> Employer Resources Portal
>> >>> http://www.cisco.com/web/learning/employer_resources/index.html
>> >>> _______________________________________________
>> >>> Dbpedia-discussion mailing list
>> >>> [email protected]
>> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Kontokostas Dimitris
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Minimize network downtime and maximize team effectiveness.
>> >> Reduce network management and security costs.Learn how to hire
>> >> the most talented Cisco Certified professionals. Visit the
>> >> Employer Resources Portal
>> >> http://www.cisco.com/web/learning/employer_resources/index.html
>> >> _______________________________________________
>> >> Dbpedia-discussion mailing list
>> >> [email protected]
>> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >>
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Minimize network downtime and maximize team effectiveness.
>> > Reduce network management and security costs.Learn how to hire
>> > the most talented Cisco Certified professionals. Visit the
>> > Employer Resources Portal
>> > http://www.cisco.com/web/learning/employer_resources/index.html
>> > _______________________________________________
>> > Dbpedia-discussion mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Minimize network downtime and maximize team effectiveness.
>> Reduce network management and security costs.Learn how to hire
>> the most talented Cisco Certified professionals. Visit the
>> Employer Resources Portal
>> http://www.cisco.com/web/learning/employer_resources/index.html
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
>
> --
>
> Pablo N. Mendes
> http://pablomendes.com
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
>
>
>
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> --
> Kind Regards
> Mohamed Morsey
> Department of Computer Science
> University of Leipzig

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to