Thanks for a quick reply, Ted. Here is the kind of data I have as an example: 20% sales for watches, Ann 50% sales for watches, Peter 70% sales for watches, Tom ... etc.
Using ngram I am able to find a 'stem' of the phrase, which is 'sales<>for<>watches<>,' and has a length of 4-gram. This is what I call a cluster. Additionally I add 2 extra parameters: date & size of the email. later I use them to evaluate the cluster. Next, I take a line similar to the example, and I want to recognize the cluster to which it belongs or to know that it belongs to none in my list only by text. I tried the sense cluster some time ago on my data, to be honest I was lost in it's options and the results I got were not good. You think there might be an option for such a task? Cheers, Jelena --- In ngram@yahoogroups.com, Ted Pedersen <tpederse@...> wrote: > > Hi Jelena, > > Could you describe in a little more detail how you are using Text::NSP as a > part of the clustering work you describe? > > http://ngram.sourceforge.net > > Text::NSP doesn't have native support for clustering, although it's easy to > imagine using it as a part of a larger clustering process (and in fact > that's what we do in our SenseClusters package > http://senseclusters.sourceforge.net ). > > In any case, if you can describe where Text::NSP fits into this picture I'm > sure we'll be able to make some suggestions. > > Cordially, > Ted > > On Fri, Jan 14, 2011 at 9:51 AM, jelena_isacenkova <jelena.info@...>wrote: > > > > > > > Hi guys, > > > > I am currently using the module to cluster the text lines (phrases) in > > n-grams and it's working fine. I would like to also cluster the new incoming > > text records based on the clustered data. I am sure it is a common thing to > > do, however, it is not described in the package. > > > > I have an idea of using a token option for describing a list of already > > existing cluster in order to check if one matches to any of existing > > clusters. Would be very interested to know your opinion and comments. > > > > Thanks in advance, > > Jelena > > > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse >