> I would love to see extractor.TableMappings fixed, and more table mappings
> created. Jona has already stopped the likely infinite loop in that class.
I
> quickly scanned through the code and had the impression that it lacks a
> termination criterion for its recursion. Shouldn't be too hard to fix if
you
> have a couple of hours to throw at it.
Unless there was a cycle in the class relations (which might even
make sense with equivalent classes), the recursion did terminate.
The problem was that the triple list was appended to itself
(and thus doubled in size) each time the method writeType was
called and in the end grew by a factor of 2^(r*c) (or thereabouts),
where r is the number of resources extracted from one table
and c is the average number of all related classes (transitive base
and equivalent classes) of each of these resources. For
pages like http://en.wikipedia.org/wiki/Ford_Crown_Victoria (see [1])
or http://en.wikipedia.org/wiki/Airbus_A300 (see [2]) the table
extractor generates around 20 variants with around 3 related classes for
each. 2^60 is a big number. :-) A server with a few
exabytes RAM might have handled that, but remember
that we then try to send that page to a poor browser. :-)

The thing is that there already is correct code for adding all
types for a resource to a triple set. We also do that in
TemplateMapping:

http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/3180c9d769fa/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala#l83

I'd like to move that code to OntologyClass, but I'd also like
to think about improving it. I think the code in TemplateMapping still
blows up if there are cycles in class relations. And there may be other
places where we re-implemented this process.

Cheers,
JC

[1]
http://mappings.dbpedia.org/server/mappings/en/extractionSamples/Mapping:Infobox_Automobile_generation
[2]
http://mappings.dbpedia.org/server/mappings/en/extractionSamples/Mapping:Infobox_aircraft_type

On Mon, Mar 12, 2012 at 23:24, Pablo Mendes <[email protected]> wrote:
>
> Hi emijrp,
> If by "underdeveloped" you mean where does DBpedia need more data, then
you
> should take a look at the mappings statistics for the language of your
> interest:
> http://mappings.dbpedia.org/index.php/Mapping_Statistics
>
> If by "underdeveloped" you mean where does DBpedia needs some coding,
then I
> would say TableMappings. Some tables on Wikipedia seem that they would
> easily produce good data:
> http://en.wikipedia.org/wiki/List_of_social_networking_websites
>
> While others would be harder (e.g. multiple links within a cell):
> http://en.wikipedia.org/wiki/List_of_sovereign_states
>
> But it would be great if it caught at least the easy cases. This would
> enable the extraction of about 73.6% [1] of the list pages on Wikipedia.
>
> I would love to see extractor.TableMappings fixed, and more table mappings
> created. Jona has already stopped the likely infinite loop in that class.
I
> quickly scanned through the code and had the impression that it lacks a
> termination criterion for its recursion. Shouldn't be too hard to fix if
you
> have a couple of hours to throw at it.
>
> Cheers,
> Pablo
>
> [1]
http://articles.businessinsider.com/2010-02-17/strategy/30008803_1_market-forecasts-analysis-data-projections
>
> On Mon, Mar 12, 2012 at 10:50 PM, emijrp <[email protected]> wrote:
>>
>> Cool, thanks!
>>
>> What are the under-developed areas in dbpedia?
>>
>> 2012/3/12 Jona Christopher Sahnwaldt <[email protected]>
>>>
>>> Hi emijrp, Anja gave you editor rights on Feb 8.
>>> Maybe you already know, but there was no
>>> reply to your mail on the list. Regards, JC
>>>
>>>
>>> On Wed, Feb 8, 2012 at 22:58, emijrp <[email protected]> wrote:
>>> > Hi. My account "emijrp" on the mappings wiki has not been activated
>>> > yet,
>>> > after some weeks waiting. Regards.
>>
>>
>>
>>
------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to