Paul Houle wrote:
> Paul Houle wrote:
>
>> I'm looking at my sample some more. Here's the distribution of
>> toplevel types from the dbpedia ontology
>>
>> +-----------------------------------+----------+
>> | type | count(*) |
>> +-----------------------------------+----------+
>> | SupremeCourtOfTheUnitedStatesCase | 3 |
>> | Website | 4 |
>> | Event | 21 |
>> | Infrastructure | 47 |
>> | Work | 525 |
>> | Organisation | 649 |
>> | Place | 712 |
>> | Person | 2208 |
>> | NULL | 6961 |
>> +-----------------------------------+----------+
>>
>>
>>
> I used the new simplified dump from metaweb to do the same thing
> with freebase. Lacking a proper schema dump, I simply assumed that the
> toplevel type was the most prevalent type (other than /common/topic)
> that applies to a topic:
>
> +---------------------------------------------------+----------+
> | url | count(*) |
> +---------------------------------------------------+----------+
> | /people/person | 4066 |
> | NULL | 3756 |
> | /location/location | 1211 |
> | /business/employer | 827 |
> | /film/film | 427 |
> | /projects/project_focus | 268 |
> | /time/event | 46 |
> | /organization/organization | 46 |
> | /transportation/road | 44 |
> | /architecture/museum | 41 |
> | /broadcast/broadcast | 40 |
> | /music/artist | 33 |
> | /time/recurring_event | 30 |
> | /music/album | 27 |
> | /book/written_work | 25 |
> | /book/periodical | 22 |
> | /education/educational_institution | 14 |
> | /base/dance/topic | 12 |
> | /business/business_location | 11 |
> | /tv/tv_program | 11 |
> | /sports/sports_team | 9 |
> | /boats/ship | 9 |
> | /metropolitan_transit/transit_line | 7 |
> | /base/amusementparks/topic | 7 |
> | /business/company | 7 |
> | /book/author | 6 |
> | /visual_art/artwork | 5 |
> | /user/robert/area_codes/topic | 5 |
> | /book/book_subject | 5 |
> | /food/dish | 4 |
> | /architecture/structure | 4 |
> | /transportation/bridge | 4 |
> | /business/shopping_center | 4 |
> | /sports/sports_facility | 3 |
> | /film/film_location | 3 |
> | /medicine/hospital | 3 |
> | /music/genre | 3 |
> | /award/award | 3 |
> | /music/composition | 3 |
> | /award/award_winner | 3 |
> | /protected_sites/protected_site | 3 |
> | /award/award_category | 2 |
> | /government/government_agency | 2 |
> | /tv/tv_network | 2 |
> | /base/disaster2/topic | 2 |
> | /user/skud/legal/topic | 2 |
> | /education/school | 2 |
> | /internet/website | 2 |
> | /base/dance/dance_company | 2 |
> | /government/governmental_body | 2 |
> | /architecture/landscape_project | 2 |
> | /biology/organism | 2 |
> | /geography/body_of_water | 2 |
> | /theater/theater_company | 2 |
> | /book/school_or_movement | 2 |
> | /user/skud/names/namesake | 2 |
> | /military/armed_force | 1 |
> | /projects/project | 1 |
> | /user/iubookgirl/default_domain/academic_library | 1 |
> | /geography/island | 1 |
> | /influence/influence_node | 1 |
> | /base/fblinux/topic | 1 |
> | /film/writer | 1 |
> | /user/rcheramy/default_domain/nickname | 1 |
> | /award/award_presenting_organization | 1 |
> | /architecture/unrealized_design | 1 |
> | /base/americancomedy/comedy_venue | 1 |
> | /base/collectives/topic | 1 |
> | /games/game | 1 |
> | /broadcast/radio_station | 1 |
> | /cvg/cvg_developer | 1 |
> | /base/omgfun/festival_series | 1 |
> | /award/award_nominee | 1 |
> | /user/petroleumj/default_domain/subway_station | 1 |
> | /business/job_title | 1 |
> | /user/skud/flags/topic | 1 |
> | /visual_art/art_subject | 1 |
> | /user/tsegaran/random/topic | 1 |
> | /book/magazine | 1 |
> | /user/techgnostic/default_domain/periodical | 1 |
> | /food/brewery_brand_of_beer | 1 |
> | /geography/bay | 1 |
> | /metropolitan_transit/transit_system | 1 |
> | /internet/website_owner | 1 |
> | /visual_art/art_owner | 1 |
> | /computer/software_developer | 1 |
> | /fictional_universe/fictional_character_creator | 1 |
> | /venture_capital/venture_investor | 1 |
> | /base/omgfun/topic | 1 |
> | /award/hall_of_fame | 1 |
> | /base/exhibitions/topic | 1 |
> | /base/symbols/topic | 1 |
> | /architecture/architectural_structure_owner | 1 |
> | /aviation/airliner_accident | 1 |
> | /guid/9202a8c04000641f800000000af896ba | 1 |
> | /user/guidewire/default_domain/online_music_store | 1 |
> | /library/public_library_system | 1 |
> | /user/gogza/default_domain/recurring_event | 1 |
> | /base/americancomedy/topic | 1 |
> +---------------------------------------------------+----------+
>
>
> (Note that this is over a list of about 11k topics that I'm doing work
> on to improve the classification of before I feed it into the next stage
> of my production pipeline)
>
> Freebase has types for about twice the number of people, and has about
> half the number of untypeds as dbpedia. The freebase "toplevels" I'm
> generating are completely uncontrolled so they you get some strange ones
> towards the bottom: the "prevalance" filter has gotten rid of a large
> number of references to certain common junk types such as the "Jungle"
> type that you find all over the place in Freebase.
>
> Note that the URL structure of "commons" types on FB tends to be
>
> {problem_domain}/{type}
>
> so you tend to see things like "book/author" where there is no
> inheritance relation between book and author. You also see "/base/..."
> types and "/user/.." types which represent namespaces inside FB.
>
> I'm going to look at the double-untyped a bit more and also merge the fb
> types into the dbpedia toplevels.
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
I assume your merge will take the form of an N3 or Turtle based Linked
Data Set?
It is my hope that Yago/SUMO will provide an alternative view of the
DBpedia that people will eventually comprehend and appreciate.
Kingsley
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion