Hi,
Today while looking at the extracted dataset we found we are not getting
any infobox properties output for some pages.
For example if you try for
http://en.wikipedia.org/wiki/International_Speedway_Corporation
Debugging told me that the problem lies in the Infobox Extractor
val MinPercentageOfExplicitPropertyKeys = 0.75
Š
val countExplicitPropertyKeys = propertyList.count(property =>
!property.key.forall(_.isDigit))
if ((countExplicitPropertyKeys >= MinPropertyCount) &&
(countExplicitPropertyKeys.toDouble / propertyList.size) >
MinPercentageOfExplicitPropertyKeys)
{
..
..
}
What is I think it says, is that we should only parse templates where it
finds minimum 75% of Keys in the (key,value) to be valid keys. The above
mentioned wiki page doesn't makes the cut. Can someone tell the about this
75% cut off. I tried with 50% limit it gives the desired output ? I know
lowering it will start giving more data some of which might be bad quality.
Regards
Amit
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers