subject:"\[Dbpedia\-discussion\] How to build a meaningful Taxonomy from Wikipedia Categories\?"

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

2013-12-22 Thread Amir H. Jadidinejad

Dear Kasun,
It's a very useful project, congratulation.
I just want to know if it's possible to leverage your method for a local set of 
documents (not all leaf categories)?
Suppose that I have a set of text documents and I want to find the 
relatedness/similarity between them using abstraction levels in the category 
network.
In this situation, I think, you need a further items to rank parent categories 
according to the initial leaf categories or modify the concept of prominent 
nodes to encompass more leaf categories. Please take a look at the following 
paper:
http://www.medelyan.com/files/medelyan-focused-taxonomies-eswc2013.pdf?attredirects=0
Also, you can see some example of local taxonomies:
https://sites.google.com/site/focusedtaxonomies/home

It's completely related to my request. Please let me know if it's possible to 
leverage your method for a local set of documents (not all leaf categories)?
Kind regards,
Amir




 From: kasun perera kkasunper...@gmail.com
To: Paul Houle ontolo...@gmail.com 
Cc: Amir H. Jadidinejad amir.jad...@yahoo.com; 
dbpedia-discussion@lists.sourceforge.net 
dbpedia-discussion@lists.sourceforge.net 
Sent: Friday, December 20, 2013 12:19 PM
Subject: Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from 
Wikipedia Categories?
 


Hi Amir

We have done some work related to Wikipedia category processing as the 
GSOC-2013 project. 

We used Wikipedia leaf categories as the starting point. Leaf category is a 
Wikipedia category page that there is no links to any other category page/s.  
Next we have defined the concept called “Prominent Node”. 

We use following 3 factors to define a prominent node
1) The initial candidates for the prominent nodes were the parents of leaf 
categories. We have used Wikipedia database dumps as our main data source, 
specifically the tables “category”, “categorylinks”, “page ” and 
“Interlanguage” .

2) Then we find the ones that head of the category name is a plural word  (e.g. 
Naturalized citizens of the United States:- pre-modiﬁer {Naturalized}, head 
{citizens} and  post-modiﬁer {of the United States}

3)  Then we get the number of interlanguage links for each prominent candidate 
category and defined that a prominent node at least it should  have 3 
interlanguage links. 

Then we did some clustering based on identified prominent category names and 
identified the concept that each prominent node belongs.

So we have produced following type of Wikipedia hierarchy
Concepts  Prominent nodes   Leaf nodes

Please look at following links [1] ,[2] for more details. If you are looking 
for this kind of work i'm happy share my experience with you.
 
[1] https://github.com/dbpedia/extraction-framework/wiki/GSOC2013_Progress_Kasun

[2] 
http://blog.dbpedia.org/2013/11/29/making-sense-out-of-the-wikipedia-categories-gsoc2013/

Thanks




On Thu, Dec 19, 2013 at 8:22 PM, Paul Houle ontolo...@gmail.com wrote:

The strength of the Wikipedia categories is that there are a lot of
them and a lot of statements matching instances to categories.

The weakness of categories is that they are completely disorganized.

There are two good strategies for using the categories.

One of them is to treat them abstractly and use them as inputs for
numerical algorithms.  For instance,  you can use algorithms such as
Kleinberg's Hubs and Authorities where categories are treated as hubs
and instances are treated as authorities.  Similarly you can create
similarity scores based on the categories shared between items.

I've used wikipedia categories to create my own well-defined
categories such as things related to New York City or obscene
things or things related to skiing  In all of these categories you
have things that are easy to ontologize,  such as ski areas,  and
other things such as

http://en.wikipedia.org/wiki/Ski_manufacturing_techniques

that are not easy to ontologize.  Generally I've made these by doing
waves of expansion and contraction,  traversing the graph and adding
inclusion and exclusion rules.  In the past with half-baked tools I've
been able to create good categories of 10,000 or so members in a day
or so.  With good tools it ought to be possible to work faster.

On Thu, Dec 19, 2013 at 4:45 AM, Amir H. Jadidinejad
amir.jad...@yahoo.com wrote:
 Hi,

 I’m trying to leverage Wikipedia Category Network for a semantic processing
 application. A set of Wikipedia articles are extracted from the document and
 I want to build a meaningful hierarchical taxonomy using Wikipedia
 categories. In my experiments, I found that the original category network of
 Wikipedia is really messy. For example, when some articles are mentioned in
 a document, it leads to the whole category network!

 I haven’t use DBpedia before; I just really interested to know, if I
 leverage DBpedia, is it possible to have a meaningful taxonomy of categories
 with hyponym relations

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

2013-12-22 Thread kasun perera

Hi Amir

Few questions to get more sense of your problem.

On Sun, Dec 22, 2013 at 9:12 PM, Amir H. Jadidinejad
amir.jad...@yahoo.comwrote:

Dear Kasun,
It's a very useful project, congratulation.
I just want to know if it's possible to leverage your method for a local
set of documents (not all leaf categories)?

It's possible to apply this method for selective list of leaf Wikipedia
categories.
How do you plan to create a match between your local set of documents and
the Wikipedia leaf categories? How do you going to decide which document is
related with which Wikipedia leaf category?

Suppose that I have a set of text documents and I want to find the
relatedness/similarity between them using abstraction levels in the
category network.
In this situation, I think, you need a further items to rank parent
categories according to the initial leaf categories or modify the concept
of prominent nodes to encompass more leaf categories.

Yes agree, using more criteria/items to find the prominent nodes could give
more accurate taxonomy.
Also we have filtered-out following Freebase named entities from the
concept list.
/people/person
/location/location
/organization/organization
/music/recordings

Thanks

Please take a look at the following paper:

http://www.medelyan.com/files/medelyan-focused-taxonomies-eswc2013.pdf?attredirects=0
Also, you can see some example of local taxonomies:
https://sites.google.com/site/focusedtaxonomies/home
It's completely related to my request. Please let me know if it's
possible to leverage your method for a local set of documents (not all
leaf categories)?
Kind regards,
Amir

--
*From:* kasun perera kkasunper...@gmail.com
*To:* Paul Houle ontolo...@gmail.com
*Cc:* Amir H. Jadidinejad amir.jad...@yahoo.com;
dbpedia-discussion@lists.sourceforge.net
dbpedia-discussion@lists.sourceforge.net
*Sent:* Friday, December 20, 2013 12:19 PM
*Subject:* Re: [Dbpedia-discussion] How to build a meaningful Taxonomy
from Wikipedia Categories?

Hi Amir

We have done some work related to Wikipedia category processing as the
GSOC-2013 project.

We used Wikipedia leaf categories as the starting point. Leaf category is
a Wikipedia category page that there is no links to any other category
page/s. Next we have defined the concept called “Prominent Node”.

We use following 3 factors to define a prominent node
1) The initial candidates for the prominent nodes were the parents of leaf
categories. We have used Wikipedia database dumps as our main data source,
specifically the tables “category”, “categorylinks”, “page ” and
“Interlanguage” .

2) Then we find the ones that *head *of the category name is a plural
word (e.g. Naturalized citizens of the United States:- pre-modiﬁer
{Naturalized}, *head {citizens}* and post-modiﬁer {of the United States}

3) Then we get the number of interlanguage links for each prominent
candidate category and defined that a prominent node at least it should
have 3 interlanguage links.

Then we did some clustering based on identified prominent category names
and identified the concept that each prominent node belongs.

So we have produced following type of Wikipedia hierarchy
Concepts Prominent nodes Leaf nodes

Please look at following links [1] ,[2] for more details. If you are
looking for this kind of work i'm happy share my experience with you.

[1]
https://github.com/dbpedia/extraction-framework/wiki/GSOC2013_Progress_Kasun
[2]
http://blog.dbpedia.org/2013/11/29/making-sense-out-of-the-wikipedia-categories-gsoc2013/

Thanks

On Thu, Dec 19, 2013 at 8:22 PM, Paul Houle ontolo...@gmail.com wrote:

The strength of the Wikipedia categories is that there are a lot of
them and a lot of statements matching instances to categories.

The weakness of categories is that they are completely disorganized.

There are two good strategies for using the categories.

One of them is to treat them abstractly and use them as inputs for
numerical algorithms. For instance, you can use algorithms such as
Kleinberg's Hubs and Authorities where categories are treated as hubs
and instances are treated as authorities. Similarly you can create
similarity scores based on the categories shared between items.

I've used wikipedia categories to create my own well-defined
categories such as things related to New York City or obscene
things or things related to skiing In all of these categories you
have things that are easy to ontologize, such as ski areas, and
other things such as

http://en.wikipedia.org/wiki/Ski_manufacturing_techniques

that are not easy to ontologize. Generally I've made these by doing
waves of expansion and contraction, traversing the graph and adding
inclusion and exclusion rules. In the past with half-baked tools I've
been able to create good categories of 10,000 or so members in a day
or so. With good tools it ought to be possible to work faster.

On Thu

[Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

2013-12-19 Thread Amir H. Jadidinejad

Hi,

I’m trying to leverage Wikipedia Category Network for a semantic processing
application. A set of Wikipedia articles are extracted from the document and
I want to build a meaningful hierarchical taxonomy using Wikipedia
categories. In my experiments, I found that the original category network of
Wikipedia is really messy. For example, when some articles are mentioned in
a document, it leads to the whole category network! 

I haven’t use DBpedia before; I just really interested to know, if I
leverage DBpedia, is it possible to have a meaningful taxonomy of categories
with hyponym relations?

attachment: winmail.dat--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

2013-12-19 Thread Paul Houle

The strength of the Wikipedia categories is that there are a lot of
them and a lot of statements matching instances to categories.

The weakness of categories is that they are completely disorganized.

There are two good strategies for using the categories.

One of them is to treat them abstractly and use them as inputs for
numerical algorithms.  For instance,  you can use algorithms such as
Kleinberg's Hubs and Authorities where categories are treated as hubs
and instances are treated as authorities.  Similarly you can create
similarity scores based on the categories shared between items.

I've used wikipedia categories to create my own well-defined
categories such as things related to New York City or obscene
things or things related to skiing  In all of these categories you
have things that are easy to ontologize,  such as ski areas,  and
other things such as

http://en.wikipedia.org/wiki/Ski_manufacturing_techniques

that are not easy to ontologize.  Generally I've made these by doing
waves of expansion and contraction,  traversing the graph and adding
inclusion and exclusion rules.  In the past with half-baked tools I've
been able to create good categories of 10,000 or so members in a day
or so.  With good tools it ought to be possible to work faster.

On Thu, Dec 19, 2013 at 4:45 AM, Amir H. Jadidinejad
amir.jad...@yahoo.com wrote:
 Hi,

 I’m trying to leverage Wikipedia Category Network for a semantic processing
 application. A set of Wikipedia articles are extracted from the document and
 I want to build a meaningful hierarchical taxonomy using Wikipedia
 categories. In my experiments, I found that the original category network of
 Wikipedia is really messy. For example, when some articles are mentioned in
 a document, it leads to the whole category network!

 I haven’t use DBpedia before; I just really interested to know, if I
 leverage DBpedia, is it possible to have a meaningful taxonomy of categories
 with hyponym relations?


 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility into your
 Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254paul.houle on Skype   ontol...@gmail.com

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

[Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from Wikipedia Categories?

4 matches

Site Navigation

Mail list logo

Footer information