Re: [Dbpedia-discussion] Dbpedia-discussion Digest, Vol 95, Issue 24

Nitsan Seniak Tue, 27 Jan 2015 10:07:09 -0800

Unsubscribe

> Le 27 janv. 2015 à 18:38, dbpedia-discussion-requ...@lists.sourceforge.net a 
> écrit :
> 
> Send Dbpedia-discussion mailing list submissions to
>    dbpedia-discussion@lists.sourceforge.net
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> or, via email, send a message with subject or body 'help' to
>    dbpedia-discussion-requ...@lists.sourceforge.net
> 
> You can reach the person managing the list at
>    dbpedia-discussion-ow...@lists.sourceforge.net
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Dbpedia-discussion digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Freebase,    Wikidata and the future of DBpedia (Paul Houle)
>   2. CfP: 3rd Linked Data Mining Challenge at    Know@LOD / ESWC 2015
>      (Petar Ristoski)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 27 Jan 2015 12:27:05 -0500
> From: Paul Houle <ontolo...@gmail.com>
> Subject: Re: [Dbpedia-discussion] Freebase,    Wikidata and the future of
>    DBpedia
> To: "M. Aaron Bossert" <maboss...@gmail.com>
> Cc: Martin Br?mmer <bruem...@informatik.uni-leipzig.de>,
>    "dbpedia-discussion@lists.sourceforge.net"
>    <dbpedia-discussion@lists.sourceforge.net>
> Message-ID:
>    <CAE__kdSgFq2veTsTHY+48c_Uynmu=nrgefcspoulz4yd+cs...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> DBpedia has a mission that is focused around extracting data from
> Wikipedia.  Importing data wholesale from Wikidata or something like that
> seems to be inconsistent with that mission,  but there are all kinds of
> temporal and provenance things that could be teased out of Wikipedia,  if
> not out of the Infoboxes.
> 
> I think most query scenarios are going to work like this
> 
> [Pot of data with provenance information]  -> [Data Set Representing a POV]
> -> query
> 
> I've been banging my head on the temporal aspect for a while and I am
> convinced that the practical answer to a lot of problems is to replace
> times with time intervals.  Intervals can be used to model duration and
> uncertainty and the overloading between those functions is not so bad
> because usually you know from the context what the interval is being used
> to represent.
> 
> There is a lot of pain right now if you want to work with dates from either
> DBpedia or Freebase because different kinds of dates are specified to
> different levels of detail.  If you make a plot of people's birthdays in
> Freebase for instance you find a lot of people born on Jan 1 I think
> because that is something 'plausible' to put in.
> 
> A "birth date" could be resolved to a short interval (I know was I born at
> 4:06 in the afternoon) and astrologers would like to know that,  but the
> frequent use of a calendar day is a statement about imprecision,  although
> defining my "birthday" as a set of one day intervals the interval is
> reflecting a social convention.
> 
> Anyway,  there is an algebra over time intervals that is well accepted
> 
> http://docs.jboss.org/drools/release/latest/drools-docs/html/DroolsComplexEventProcessingChapter.html#d0e10852
> 
> and could be implemented either as a native XSD data type or by some
> structure involving "blank" nodes.
> 
> 
> 
> On Tue, Jan 27, 2015 at 11:22 AM, M. Aaron Bossert <maboss...@gmail.com>
> wrote:
> 
>> Martin,
>> 
>> When I first started working with RDF, I didn't fully "get" the full
>> expressivity of it.  All of the things you are saying can't be done
>> (perhaps, easily?) are quite simple to implement.  When compared to the
>> property graph model, RDF, at first glance, seems inferior, but in reality,
>> is much more expressive, in my opinion.  Through reification, you can
>> express all of the concepts that you are wanting to (provenance, date
>> ranges, etc).  At the end of the day, RDF's expressivity comes at the cost
>> of verbosity, which, in my opinion is well worth it.
>> 
>> If you would like some help in modeling your graph to represent the
>> missing concepts that you are after, I will be happy to help you out with
>> some more specific examples and pointers if it would be helpful to you.
>> 
>> Aaron
>> 
>>>> On Jan 27, 2015, at 06:33, Martin Br?mmer <
>>> bruem...@informatik.uni-leipzig.de> wrote:
>>> 
>>> Hi DBpedians!
>>> 
>>> As you surely have noticed, Google has abandoned Freebase and it will
>>> merge with Wikidata [1]. I searched the list, but did not find a
>>> discussion about it. So here goes my point of view:
>>> 
>>> When Wikidata was started, I hoped it would quickly become a major
>>> contributor of quality data to the LOD cloud. But although the project
>>> has a potentially massive crowd and is backed by Wikimedia, it does not
>>> really care about the Linked Data paradigm as established in the
>>> Semantic Web. RDF is more of an afterthought than a central concept. It
>>> was a bit disappointing to see that Wikidata's impact on the LOD
>>> community is lacking because of this.
>>> 
>>> Now Freebase will be integrated into Wikidata as a curated, Google
>>> engineering hardened knowledge base not foreign to RDF and Linked Data.
>>> How the integration will be realized is not yet clear it seems. One
>>> consequence is hopefully, that the LOD cloud grows by a significant
>>> amount of quality data. But I wonder what the consequences for the
>>> DBpedia project will be? If Wikimedia gets their own knowledge graph,
>>> possible curated by their crowd, where is the place for the DBpedia? Can
>>> DBpedia stay relevant with all the problems of an open source project,
>>> all the difficulties with mapping heterogeneous data in many different
>>> languages, the resulting struggle with data quality and consistency and
>>> so on?
>>> 
>>> So I propose being proactive about it:
>>> 
>>> I see a large problem of the DBpedia with restrictions of the RDF data
>>> model. Triples limit our ability to make statements about statements. I
>>> cannot easily address a fact in the DBpedia and annotate it. This means:
>>> 
>>>   -I cannot denote the provenance of a statement. I especially cannot
>>> denote the source data it comes from. Resource level provenance is not
>>> sufficient if further datasets are to be integrated into DBpedia in the
>>> future.
>>>   -I cannot denote a timespan that limits the validity of a statement.
>>> Consider the fact that Barack Obama is the president of the USA. This
>>> fact was not valid at a point in the past and won't be valid at some
>>> point in the future. Now I might link the DBpedia page of Barack Obama
>>> for this fact. Now if a DBpedia version is published after the next
>>> president of the USA was elected, this fact might be missing from the
>>> DBpedia and my link becomes moot.     -This is a problem with
>>> persistency. Being able to download old dumps of DBpedia is not a
>>> sufficient model of persistency. The community struggles to increase
>>> data quality, but as soon as a new version is published, it drops some
>>> of the progress made in favour of whatever facts are found in the
>>> Wikipedia dumps at the time of extraction. The old facts should persist,
>>> not only in some dump files, but as linkable data.
>>> 
>>> Being able to address these problems would also mean being able to fully
>>> import Wikidata, including provenance statements and validity timespans,
>>> and combine it with the DBpedia ontology (which already is an important
>>> focus of development and rightfully so). It also means a persistent
>>> DBpedia that does not start over in the next version.
>>> 
>>> So how can it be realized? With reification of course! But most of us
>>> resent the problems reification brings with it, the complications in
>>> querying etc. The reification model itself is also unclear. There are
>>> different proposals, blank nodes, reification vocabulary, graph names,
>>> creating unique subproperties for each triple etc. Now I won't propose
>>> using one of these models, this will surely be subject to discussion.
>>> But the DBpedia can propose a model and the LOD community will adapt,
>>> due to DBpedia's state and impact. I think it is time to up the standard
>>> of handling provenance and persistence in the LOD cloud and DBpedia
>>> should make the start. Especially in the face of Freebase and Wikidata
>>> merging, I believe it is imperative for the DBpedia to move forward.
>>> 
>>> regards,
>>> Martin
>>> 
>>> [1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> 
>> 
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> 
> 
> 
> -- 
> Paul Houle
> Expert on Freebase, DBpedia, Hadoop and RDF
> (607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
> http://legalentityidentifier.info/lei/lookup
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 27 Jan 2015 18:38:11 +0100
> From: "Petar Ristoski" <petar.risto...@informatik.uni-mannheim.de>
> Subject: [Dbpedia-discussion] CfP: 3rd Linked Data Mining Challenge at
>    Know@LOD / ESWC 2015
> To: <dbpedia-discussion@lists.sourceforge.net>
> Message-ID:
>    <003e01d03a58$05aea8d0$110bfa70$@informatik.uni-mannheim.de>
> Content-Type: text/plain; charset="iso-8859-2"
> 
> [Apologies for cross-posting]
> 
> 
> 
> ********************************************
> 
> CALL FOR CHALLENGE PARTICIPATION: 
> 
> ********************************************
> 
> 
> 
> 3rd Linked Data Mining Challenge organized in connection with the Know@LOD
> 2015 workshop at ESWC Conference (ESWC 2015), May 31 - June 4, 2015. 
> 
> 
> 
> Venue: Portoroz, Slovenia
> 
> Date: 01 June 2015
> 
> URL:
> http://knowalod2015.informatik.uni-mannheim.de/en/linked-data-mining-challen
> ge/ 
> 
> 
> 
> ********************************************
> 
> IMPORTANT DATES
> 
> ********************************************
> 
> 
> 
> 27 March 2015: Submission of papers and solution deadline
> 
> 03 April 2015: Notification of acceptance
> 
> 
> 
> ********************************************
> 
> GENERAL OVERVIEW OF THE CHALLENGE
> 
> ********************************************
> 
> 
> 
> Linked data represents a novel type of data source that has been so far
> nearly untouched by advanced data mining methods. It breaks down many
> traditional assumptions on source data and thus represents a number of
> challenges:
> 
> 
> 
> - While the individual published datasets typically follow a relatively
> regular, relational-like (or hierarchical, in the case of taxonomic
> classification) structure, the presence of semantic links among them makes
> the resulting 'hyper-dataset' akin to general graph datasets. On the other
> hand, compared to graphs such as social networks, there is a larger variety
> of link types in the graph.
> 
> - The datasets have been published for entirely different purposes, such as
> statistical data publishing based on legal commitment of government bodies
> vs. publishing of encyclopedic data by internet volunteers vs. data sharing
> within a researcher community. This introduces further data modeling
> heterogeneity and uneven degree of completeness and reliability.
> 
> - The amount and diversity of resources as well as their link sets is
> steadily growing, which allows for inclusion of new linked datasets into the
> mining dataset nearly on the fly, at the same time, however, making the
> feature selection problem extremely hard.
> 
> 
> 
> The Linked Data Mining Challenge 2015 (LDMC) will consist of one task, which
> is the prediction of the review class of movies.
> 
> 
> 
> The best participant in the challenge will be awarded. The ranking of the
> participants will be made by the LDMC organizers, taking into account both
> the quality of the submitted LDMC paper (evaluated by Know@LOD workshop PC
> members) and the prediction quality (i.e., accuracy, see below).
> 
> 
> 
> ********************************************
> 
> TASK OVERVIEW
> 
> ********************************************
> 
> 
> 
> The task concerns the prediction of a review of movies, i.e., "good" and
> "bad". The initial dataset is retrieved from Metacritic, which offers an
> average rating of all time reviews for a list of movies. The ratings were
> used to divide the movies into classes, i.e., movies with score above 60 are
> regarded as "good" movies, while movies with score less than 40 are regarded
> as "bad" movies. For each movie we provide the corresponding DBpedia URI.
> The mappings can be used to extract semantic features from DBpedia or other
> LOD repositories to be exploited in the learning approaches proposed in the
> challenge.
> 
> 
> 
> ********************************************
> 
> SOURCE DATA, REQUIRED RESULTS, AND EVALUATION
> 
> ********************************************
> 
> 
> 
> The dataset is available for download here:
> http://knowalod2015.informatik.uni-mannheim.de/en/linked-data-mining-challen
> ge/. It consists of training data of 1,600 instances for learning the
> predictive models (this data contains the value of the target attribute) and
> testing data of 400 instances for evaluating the models (this data is
> without the target attribute). The target attribute to be predicted is the
> Label attribute.
> 
> The datasets contain semicolon separated values: movie's title "Movie",
> movie's release date "Release date", movie's DBpedia URI "DBpedia_URI",
> movie's label "Label" (only for the training set), and "id". A sample of the
> training CSV file is as follows:
> 
> 
> 
> "Movie";"Release date";"DBpedia_URI";"Label";"id"
> 
> "Best Kept Secret";9/6/13 12:00
> AM;"http://dbpedia.org/resource/Best_Kept_Secret_(film)";"good";1.0
> 
> 
> 
> The participants have to submit the achieved results on testing data, i.e.
> label of the movie. The results have to be delivered in a (syntactically
> correct) CSV format that includes the predicted label. The submitted results
> will be evaluated on a gold standard with respect to the accuracy.
> 
> 
> 
> Beside the CSV file containing the predictions, the participants are
> expected to submit a paper describing the used methods and techniques, as
> well as the results obtained, i.e., the hypotheses perceived as interesting
> either by the computational method or by the participants themselves.
> 
> The participants should provide a detailed description of their approach, so
> that it can be easily reproduced. For example, it should be clearly stated
> what are the used feature sets (and how they have been created), the
> preprocessing steps, the type of the predictor, the model parameters' values
> and tuning, etc.
> 
> 
> 
> The papers will be evaluated by the evaluation panel, both with respect to
> the soundness and originality of the methods used and with respect to the
> validity of the hypotheses and nuggets found. It should meet the standard
> norms of scientific writing.
> 
> 
> 
> ********************************************
> 
> ALLOWED DATASETS
> 
> ********************************************
> 
> 
> 
> For building the movie review predictor, any dataset that follows the Linked
> Open Data principles is allowed to be used.
> 
> Non-LOD datasets are allowed to be used only if the participants later
> publish those datasets in a way that would make them accessible using some
> of the standard Semantic Web technologies, e.g., RDF, SPARQL, etc.
> 
> For example, one may map the movies from the dataset to the corresponding
> movies in a non-LOD dataset X, allowing to retrieve additional data from the
> dataset X. Then, it is expected from the participants to publish the DBpedia
> mappings to the dataset X movies, and the additional data retrieved from the
> dataset X, for example, using RDF.
> 
> 
> 
> *IMPORTANT: Since the Metacritic dataset is publicly available, we kindly
> ask the participants not to use the Metacritic movies' rating score to tune
> the predictor for the movies in the test set. Any submission found not to
> comply with this rule will be disqualified.
> 
> However, other information than the movies' rating score retrieved from
> Metacritic is allowed, e.g., users' textual reviews for a given movie.
> 
> 
> 
> ********************************************
> 
> SUBMISSION PROCEDURE
> 
> ********************************************
> 
> 
> 
> Results submission
> 
> 
> 
> - Register using the registration web form available at:
> http://ldmc15.informatik.uni-mannheim.de/signup 
> 
> - Build a prediction model on the training set.
> 
> - Apply the model on the test set to predict the label.
> 
> - Submit the results at: http://ldmc15.informatik.uni-mannheim.de/submit 
> 
> - Your final score will be the one computed with respect to the last result
> submission made before Friday March 27th
> 
> 
> 
> -----
> 
> Paper submission
> 
> 
> 
> - In addition to your results, you have to submit a paper describing your
> solution
> 
> - The paper format is Springer LNCS, with a limit of four pages
> 
> - Papers are submitted online via Easychair before Friday March 27th
> 
> 
> 
> -----
> 
> Presentation of Results
> 
> 
> 
> - Challenge papers will be included in the workshop proceedings of Know@LOD
> 
> - The authors of the best performing systems will be asked to present their
> solution at the workshop
> 
> 
> 
> For any questions related to the submission procedure, please address the
> contact persons below.
> 
> 
> 
> ********************************************
> 
> ORGANIZATION
> 
> ********************************************
> 
> 
> 
> Petar Ristoski, University of Mannheim, Germany, petar.ristoski (at)
> informatik.uni-mannheim.de
> 
> Heiko Paulheim, University of Mannheim, Germany, heiko (at)
> informatik.uni-mannheim.de
> 
> Vojt?ch Sv?tek, University of Economics, Prague, svatek (at) vse.cz
> 
> V?clav Zeman,  University of Economics, Prague, vaclav.zeman (at) vse.cz
> 
> 
> 
> 
> 
> --
> 
> Petar Ristoski
> 
> Data and Web Science Group
> 
> University of Mannheim
> 
> Phone: +49 621 181 3705
> 
> B6, 26, Room C1.07
> 
> D-68159 Mannheim
> 
> 
> 
> Mail:  <mailto:petar.risto...@informatik.uni-mannheim.de>
> petar.risto...@informatik.uni-mannheim.de
> 
> Web: dws.informatik.uni-mannheim.de
> 
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> 
> ------------------------------
> 
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> 
> 
> End of Dbpedia-discussion Digest, Vol 95, Issue 24
> **************************************************


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Dbpedia-discussion Digest, Vol 95, Issue 24

Reply via email to