Andrea is right with his observation but I think we should concentrate our
efforts on the mappings extractor
If you need a specific property from that infobox, just add a mapping and
both you and everyone will benefit from it.

http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_company
http://mappings.dbpedia.org/server/extraction/en/extract?title=International_Speedway_Corporation&revid=&format=trix

Maybe in the future we should delete the InfoboxExtractor or generate an
extra triple for every property linking to the mappings wiki
completely people might start creating more mappings then :)

Cheers,
Dimitris



On Thu, Dec 19, 2013 at 5:24 PM, Andrea Di Menna <[email protected]> wrote:

> There are different pages with the same problem:
>
>
> http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fres+%3Ftopic+where+%7B+%0D%0A%3Fres+dbpedia-owl%3Aindustry+%3Font_ind+.+%0D%0AOPTIONAL+%7B+%3Fres+dbpprop%3Aindustry+%3Fprop_ind+%7D+.+%0D%0AFILTER+%28+%21bound%28%3Fprop_ind%29+%29+.%0D%0A%3Ftopic+foaf%3AprimaryTopic+%3Fres+.+%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on
>
> select distinct ?res ?topic where {
> ?res dbpedia-owl:industry ?ont_ind .
> OPTIONAL { ?res dbpprop:industry ?prop_ind } .
> FILTER ( !bound(?prop_ind) ) .
>  ?topic foaf:primaryTopic ?res .
> }
>
> Would be could if we could identify and fix faulty pages or try to find
> another heuristic rule for the Infobox extractor.
>
> WDYT?
>
> Cheers
> Andrea
>
>
>
>
> 2013/12/19 Andrea Di Menna <[email protected]>
>
>> Hi Amit,
>>
>> thanks for posting your question :)
>> The rule you mention defines a key to be valid when is not a plain number
>> (i.e. it does not have only digits) - i.e. they are explicit.
>> This because templates can have either explicit or implicit parameters:
>> - an explicit parameter has a name
>> - implicit parameters are identified by their position, so they have no
>> name but only an index
>>
>> E.g.
>> {{Template|name=...|surname=...}} => properties are { "name" => ...;
>> "surname" => ...}
>> and
>> {{Template|...|...}} => properties are { "1" => ...; "2" => ...}
>>
>> The MinPercentageOfExplicitPropertyKeys is used to skip useless
>> templates.
>>
>> The real problem with the page you mention is not the percentage we are
>> using, but how the template is filled in with data:
>>
>> {{Infobox company
>> | name      =  International Speedway Corporation|
>> | logo      =  [[Image:Iscmotorsportslogo.png]]
>> | type      =  [[Public company|Public]]  |
>> | traded_as  = {{NASDAQ|ISCA}}<br />{{OTCQB|ISCB}}
>> | foundation        =  1953 (as Bill France Racing, Inc.)|
>> | location          =  1 Daytona Boulevard<br />[[Daytona Beach,
>> Florida]]  32114-1243|
>> | key_people        =  [[Bill France, Sr.]], founder<br/>[[Jim France]],
>> CEO<br/>[[Lesa Kennedy]], president|
>> | industry          =  [[Auto racing|Motorsports]]|
>> | products          =  Sporting events|
>> | revenue           =  {{decrease}} $633.91 million [[United States
>> dollar|USD]] (2010, November)|
>> | operating_income  =  {{decrease}} $115.64 million [[United States
>> dollar|USD]] (2010, November)|
>> | net_income        =  {{decrease}} $54.53 million [[United States
>> dollar|USD]] (2010, November)|
>> | num_employees     =   1,000 (full time) |
>> | homepage          =  [http://www.iscmotorsports.com/
>> www.iscmotorsports.com]|
>> }}
>>
>> There is a bunch of useless misleading trailing pipes ("|") in the
>> template properties.
>> The effect is that the parser thinks there is a number of implicit
>> parameters which will be counted in the list of params (hence the template
>> is below the threshold of 75%).
>> Can you fix the wikipedia article?
>>
>> More general question: which are the allowed chars in a implicit template
>> param?
>>
>> Cheers
>> Andrea
>>
>>
>> 2013/12/19 Amit Kumar <[email protected]>
>>
>>> Hi,
>>> Today while looking at the extracted dataset we found we are not getting
>>> any infobox properties output for some pages.
>>> For example if you try for
>>> http://en.wikipedia.org/wiki/International_Speedway_Corporation
>>>
>>> Debugging told me that the problem lies in the Infobox Extractor
>>>
>>> val MinPercentageOfExplicitPropertyKeys = 0.75
>>> Š
>>>
>>> val countExplicitPropertyKeys = propertyList.count(property =>
>>> !property.key.forall(_.isDigit))
>>> if ((countExplicitPropertyKeys >= MinPropertyCount) &&
>>> (countExplicitPropertyKeys.toDouble / propertyList.size) >
>>> MinPercentageOfExplicitPropertyKeys)
>>> {
>>> ..
>>> ..
>>> }
>>>
>>> What is I think it says, is that we should only parse templates where it
>>> finds minimum 75% of Keys in the (key,value) to be valid keys. The above
>>> mentioned wiki page doesn't makes the cut. Can someone tell the about
>>> this
>>>  75% cut off. I tried with 50% limit it gives the desired output ? I know
>>> lowering it will start giving more data some of which might be bad
>>> quality.
>>>
>>>
>>>
>>> Regards
>>> Amit
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>> organizations don't have a clear picture of how application performance
>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>> your
>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>> AppDynamics Pro!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>


-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to