Paul Houle wrote:
> Paul Houle wrote:
>   
>>       I'm looking at my sample some more.  Here's the distribution of 
>> toplevel types from the dbpedia ontology
>>
>> +-----------------------------------+----------+
>> | type                              | count(*) |
>> +-----------------------------------+----------+
>> | SupremeCourtOfTheUnitedStatesCase |        3 |
>> | Website                           |        4 |
>> | Event                             |       21 |
>> | Infrastructure                    |       47 |
>> | Work                              |      525 |
>> | Organisation                      |      649 |
>> | Place                             |      712 |
>> | Person                            |     2208 |
>> | NULL                              |     6961 |
>> +-----------------------------------+----------+
>>
>>   
>>     
>     I used the new simplified dump from metaweb to do the same thing 
> with freebase.  Lacking a proper schema dump,  I simply assumed that the 
> toplevel type was the most prevalent type (other than /common/topic) 
> that applies to a topic:
>
> +---------------------------------------------------+----------+
> | url                                               | count(*) |
> +---------------------------------------------------+----------+
> | /people/person                                    |     4066 |
> | NULL                                              |     3756 |
> | /location/location                                |     1211 |
> | /business/employer                                |      827 |
> | /film/film                                        |      427 |
> | /projects/project_focus                           |      268 |
> | /time/event                                       |       46 |
> | /organization/organization                        |       46 |
> | /transportation/road                              |       44 |
> | /architecture/museum                              |       41 |
> | /broadcast/broadcast                              |       40 |
> | /music/artist                                     |       33 |
> | /time/recurring_event                             |       30 |
> | /music/album                                      |       27 |
> | /book/written_work                                |       25 |
> | /book/periodical                                  |       22 |
> | /education/educational_institution                |       14 |
> | /base/dance/topic                                 |       12 |
> | /business/business_location                       |       11 |
> | /tv/tv_program                                    |       11 |
> | /sports/sports_team                               |        9 |
> | /boats/ship                                       |        9 |
> | /metropolitan_transit/transit_line                |        7 |
> | /base/amusementparks/topic                        |        7 |
> | /business/company                                 |        7 |
> | /book/author                                      |        6 |
> | /visual_art/artwork                               |        5 |
> | /user/robert/area_codes/topic                     |        5 |
> | /book/book_subject                                |        5 |
> | /food/dish                                        |        4 |
> | /architecture/structure                           |        4 |
> | /transportation/bridge                            |        4 |
> | /business/shopping_center                         |        4 |
> | /sports/sports_facility                           |        3 |
> | /film/film_location                               |        3 |
> | /medicine/hospital                                |        3 |
> | /music/genre                                      |        3 |
> | /award/award                                      |        3 |
> | /music/composition                                |        3 |
> | /award/award_winner                               |        3 |
> | /protected_sites/protected_site                   |        3 |
> | /award/award_category                             |        2 |
> | /government/government_agency                     |        2 |
> | /tv/tv_network                                    |        2 |
> | /base/disaster2/topic                             |        2 |
> | /user/skud/legal/topic                            |        2 |
> | /education/school                                 |        2 |
> | /internet/website                                 |        2 |
> | /base/dance/dance_company                         |        2 |
> | /government/governmental_body                     |        2 |
> | /architecture/landscape_project                   |        2 |
> | /biology/organism                                 |        2 |
> | /geography/body_of_water                          |        2 |
> | /theater/theater_company                          |        2 |
> | /book/school_or_movement                          |        2 |
> | /user/skud/names/namesake                         |        2 |
> | /military/armed_force                             |        1 |
> | /projects/project                                 |        1 |
> | /user/iubookgirl/default_domain/academic_library  |        1 |
> | /geography/island                                 |        1 |
> | /influence/influence_node                         |        1 |
> | /base/fblinux/topic                               |        1 |
> | /film/writer                                      |        1 |
> | /user/rcheramy/default_domain/nickname            |        1 |
> | /award/award_presenting_organization              |        1 |
> | /architecture/unrealized_design                   |        1 |
> | /base/americancomedy/comedy_venue                 |        1 |
> | /base/collectives/topic                           |        1 |
> | /games/game                                       |        1 |
> | /broadcast/radio_station                          |        1 |
> | /cvg/cvg_developer                                |        1 |
> | /base/omgfun/festival_series                      |        1 |
> | /award/award_nominee                              |        1 |
> | /user/petroleumj/default_domain/subway_station    |        1 |
> | /business/job_title                               |        1 |
> | /user/skud/flags/topic                            |        1 |
> | /visual_art/art_subject                           |        1 |
> | /user/tsegaran/random/topic                       |        1 |
> | /book/magazine                                    |        1 |
> | /user/techgnostic/default_domain/periodical       |        1 |
> | /food/brewery_brand_of_beer                       |        1 |
> | /geography/bay                                    |        1 |
> | /metropolitan_transit/transit_system              |        1 |
> | /internet/website_owner                           |        1 |
> | /visual_art/art_owner                             |        1 |
> | /computer/software_developer                      |        1 |
> | /fictional_universe/fictional_character_creator   |        1 |
> | /venture_capital/venture_investor                 |        1 |
> | /base/omgfun/topic                                |        1 |
> | /award/hall_of_fame                               |        1 |
> | /base/exhibitions/topic                           |        1 |
> | /base/symbols/topic                               |        1 |
> | /architecture/architectural_structure_owner       |        1 |
> | /aviation/airliner_accident                       |        1 |
> | /guid/9202a8c04000641f800000000af896ba            |        1 |
> | /user/guidewire/default_domain/online_music_store |        1 |
> | /library/public_library_system                    |        1 |
> | /user/gogza/default_domain/recurring_event        |        1 |
> | /base/americancomedy/topic                        |        1 |
> +---------------------------------------------------+----------+
>
>
> (Note that this is over a list of about 11k topics that I'm doing work 
> on to improve the classification of before I feed it into the next stage 
> of my production pipeline)
>
> Freebase has types for about twice the number of people,  and has about 
> half the number of untypeds as dbpedia.  The freebase "toplevels" I'm 
> generating are completely uncontrolled so they you get some strange ones 
> towards the bottom:  the "prevalance" filter has gotten rid of a large 
> number of references to certain common junk types such as the "Jungle" 
> type that you find all over the place in Freebase.
>
> Note that the URL structure of "commons" types on FB tends to be
>
> {problem_domain}/{type}
>
> so you tend to see things like "book/author" where there is no 
> inheritance relation between book and author.  You also see "/base/..." 
> types and "/user/.." types which represent namespaces inside FB.
>
> I'm going to look at the double-untyped a bit more and also merge the fb 
> types into the dbpedia toplevels.
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>   
I assume your merge will take the form of an N3 or Turtle based Linked 
Data Set?

It is my hope that Yago/SUMO will provide an alternative view of the 
DBpedia that people will eventually comprehend and appreciate.



Kingsley

-- 


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com





------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to