On Thu, May 10, 2012 at 9:09 PM, Marco Amadori <[email protected]> wrote:
> On Thursday 10 May 2012 21:06:30 Jona Christopher Sahnwaldt wrote:
>> I think what Marco meant was: the mapping says it's an object
>> property, so we should extract a URI, even if the property value is
>> just a string.
>
> Right.

At first glance, that may look like a nice idea, but it would (very
likely) mean that DBpedia would extract many additional URIs that are
wrong and only a few additional URIs that are correct. Slightly better
recall, much worse precision. I should add that that's a (strong)
hunch based on my experience with DBpedia extractions and a few clicks
in Wikipedia. I don't have actual data to back this claim.

>
>> In the case of the musician infoboxes on it wiki, that would work, but
>> in many other cases, it wouldn't. For example:
>> http://en.wikipedia.org/wiki/Glenn_Danzig contains "label = Plan 9,
>> Evilive". Just that string, no links. We could use a heuristic to
>> split the string into multiple links etc, but I don't think there's a
>> good, clean solution. With a naive approach we would extract
>> <http://en.dbpedia.org/resource/Plan_9,_Evilive>, which would be
>> wrong.
>
> We should use the same euristic used by the wikimedia template engine, that
> way it would match with the proper wikipedia page.

The Wikimedia template engine in general does not use heuristics. The
specific template
http://it.wikipedia.org/wiki/Template:Artista_musicale also does not
use heuristics, but pretty simple rules: whatever value Wikipedia
users enter for one of the 'genere' properties is wrapped in '[[' and
']]', and thus rendered as a link.

That's why it would make sense to allow users to add a special flag to
a property mapping. Our framework should always extract the wikitext
string value for one of the 'genere' properties as a RDF URI, not as a
RDF literal. But for most other properties, such behavior would lead
to wrong URIs. Even if the template property maps to an object
property.

>> There is a simple rule though: If the Wikipedia template renders
>> strings as links, then we should extract strings as URIs. Otherwise we
>> shouldn't.
>>
>> The problem is that our code can't find out what the template does
>> (well, it could, but that would be almost as hard as rendering
>> templates). But humans can. So to implement that rule, we have to add
>> a feature to the mappings wiki, as I described in my previous mail, so
>> users can add a flag saying "yes, plain string values in this property
>> should be extracted as URIs".
>
> But that isn't implicit if the mapping creator maps it to ObjectProperty ?

No, see above.

>> It seems that Italian Wikipedia templates often work like this, while
>> English templates rarely do. To make that behavior possible, the
>> Italian templates use multiple properties like genere, genere2,
>> genere3 etc, while the English templates use one property which the
>> editors can fill with links or strings as they like.
>
> This again means to me that we should trust mappings.

But only if the mapping explicitly states that this property should
always be extracted as a URI. The editor of the mapping should check
the source code of the Wikipedia template. If the template ALWAYS
renders a property value as a link, then we should ALWAYS extract URIs
for the property values. If not, then we should ONLY extract a URI for
the property value if we can find a matching link somewhere on the
page (that's the heuristic we use now).


One more thing that may be relevant for this discussion:

In the case of Template:Artista_musicale, things are even more
intricate. Template:Artista_musicale calls
http://it.wikipedia.org/wiki/Template:Autocat_musica for each 'genere'
property. Template:Autocat_musica contains a long list of musical
genres and slightly different names that may be used for them, for
example:

|acidjazz
|acid-jazz
|acid jazz=[[Acid jazz]][[Categoria:{{{tipo}}} acid jazz]]

This means that if a Wikipedia page contains "genere=acidjazz", the
rendered HTML will contain a link to "Acid jazz". DBpedia won't be so
smart. Even if we extend the framework with that special "always
extract this property as URI" flag, DBpedia would extract the URI
http://it.dbpedia.org/resource/Acidjazz for this property - which is
pretty useless, since there is no redirect from
http://it.wikipedia.org/wiki/Acidjazz to
http://it.wikipedia.org/wiki/Acid_jazz.

But that's a minor problem. I still think that adding that flag would
be a good thing.

Cheers,
JC



>
> --
> ESC:wq

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to