On Fri, Sep 25, 2009 at 1:30 PM, jakobitsch juergen <[email protected]> wrote:
> first thanks for taking the time!
>
> that is correct - i'm not trying to do POT (part of speech tagging).
>
> actually i know that it must work also with mahout : kea uses
> a trained classifier (from weka) and tfXidf (= term frequenzy - inverse 
> document frequenzy)
> to identify keyphrase candidates. it is then possible to check these 
> candidates
> against a controlled vocabulary (i.e. skos thesaurus).
>

I might be off from what you are looking, but after identifying the
(noun/verb) phrases from the text with a POS tagger, you could run the
TF-IDF analysis.

If that's not the case, how are the phrases identified in the first
place? Is it based on shingles?  I am curious to know  ways to get all
the meaningful phrases from the text other than POS tagging.

--shashi

>
> anyway thanks!
>
> ;) i'm smelling an opportunity to get famous!
>
> wkr www.turnguard.com
>
>
>
>
> ----- Original Message ----
> From: Isabel Drost <[email protected]>
> To: [email protected]
> Sent: Friday, September 25, 2009 9:37:36 AM
> Subject: Re: newbie intro
>
> On Fri, 25 Sep 2009 10:04:10 +0530
> Shashikant Kore <[email protected]> wrote:
>
>> On Wed, Sep 23, 2009 at 8:18 PM, Ted Dunning <[email protected]>
>> wrote:
>> > One of the clustering algorithms has a patch that should have some
>> > at-least-ok key phrase extraction.  Shashi was digging into that.
>
>> Extracting phrases (noun/verb) could be done with OpenNLP, Gate,
>> LingPipe, and many other similar tools.
>
> I think the phrases extracted by OpenNLP are different from what kea
> does in that kea sort of tries to find phrases that best represent the
> topic of the text. Some sort of automatic tagging of texts with topics.
>
> Isabel
>
>
>
>
>

Reply via email to