On 29/05/14 13:53, Thomas Douillard wrote:
hehe, maybe some kind inferences can lead to a good heuristic to suggest
properties and values in the entity suggester. As they naturally become
"softer" and "softer" by combination of uncertainties, this could also
provide some kind of limits for inferences by fixing a probability below
which we don't add a fuzzy fact to the set of facts.

Maybe we could fix an heuristic starting fuzziness or probability score
based on  "1 sourced claim" -> big score ; one disputed claim ; based on
ranks and so on.

Sorry, I have to expand on this a bit ...

My main point was that there are many fuzzy logics (depending on the t-norm you chose) and many probabilistic logics (depending on the stochastic assumptions you make). The meaning of a score crucially depends on which logic you are in. Moreover, at least in fuzzy logic, the scores only are relevant in comparison to other scores (there is no absolute meaning to "0.3") -- therefore you need to ensure that the scores are assigned in a globally consistent way (0.3 in Wikidata would have to mean exactly the same wherever it is used).

This makes it extremely hard to implement such an approach in practice in a large, distributed knowledge base like ours. What's more, you cannot find these scores in books or newspapers, so you somehow have to make them up in another way. You suggested to use this for statements that are not generally accepted, but how do you measure "how disputed" a statement is? If two thirds of references are for it and the rest is against it, do you assign 0.66 as a score? It's very tricky.

Fuzzy logic has its main use in fuzzy control (the famous "washing machine" example), which is completely different and largely unrelated to fuzzy knowledge representation. In knowledge representation, fuzzy approaches are also studied, but their application is usually in a closed system (e.g., if you have one system that extracts data from a text and assigns "certainties" to all extracted facts in the same way). It's still unclear how to choose the right logic, but at least it will give you a uniform treatment of your data according to some fixed principles (whether they make sense or not).

The situation is much clearer in probabilistic logics, where you define your assumptions first (e.g., you assume that events are independent or that dependencies are captured in some specific way). This makes it more rigorous, but also harder to apply, since in practice these assumptions rarely hold. This is somewhat tolerable if you have a rather uniform data set (e.g., a lot of sensor measurements that give you some probability for actual states of the underlying system). But if you have a huge, open, cross-domain system like Wikidata, it would be almost impossible to force it into a particular probability framework where "0.3" really means "in 30% of all cases".

Also note that scientific probability is always a limit of observed frequencies. It says: if you do something again and again, this is the rate you will get. Often-heard statements like "We have an 80% chance to succeed!" or "Chances are almost zero that the Earth will blow up tomorrow!" are scientifically pointless, since you cannot repeat the experiments that they claim to make statements about. Many things we have in Wikidata are much more on the level of such general statements than on the level that you normally use probability for (good example of a proper use of probability: "based the tests that we did so far, this patient has a 35% chance of having cancer" -- these are not the things we normally have in Wikidata).

Markus



2014-05-29 13:43 GMT+02:00 Markus Krötzsch
<mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>:

    On 29/05/14 12:41, Thomas Douillard wrote:

        @David:
        I think you should have a look to fuzzy logic
        <https://www.wikidata.org/__wiki/Q224821
        <https://www.wikidata.org/wiki/Q224821>>:)


    Or at probabilistic logic, possibilistic logic, epistemic logic, ...
    it's endless. Let's first complete the data we are sure of before we
    start to discuss whether Pluto is a planet with fuzzy degree 0.6 or
    0.7 ;-)

    (The problem with quantitative logics is that there is usually no
    reference for the numbers you need there, so they are not well
    suited for a secondary data collection like Wikidata that relies on
    other sources. The closest concept that still might work is
    probabilistic logic, since you can really get some probabilities
    from published data; but even there it is hard to use the
    probability as a raw value without specifying very clearly what the
    experiment looked like.)

    Markus



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to