Hi Erick, If you want to query, you should know the "phase" right? but I want to discover the phase, or which words came together so often and by the natural way, we use that as a phase.
On Tue, Oct 6, 2009 at 8:12 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Maybe I'm missing the problem entirely, but can you use phrase queries?or > one of the Span* queries with a slop of 0 when searching? > > Best > Erick > > On Tue, Oct 6, 2009 at 7:42 AM, Andrew Zhang <rooseve6...@gmail.com> > wrote: > > > Hi guys, > > > > The requirement is very simple here, e.g. for this sentence, 'The NBA > > formally announced its new *social media* guidelines Wednesday', I want > to > > treat '*social media*' as a whole phase term. The default english > analyzers > > came with lucene all deal with single word, so it you want to get the > most > > frequent terms, *social *and *media* are separated, and each of them > can't > > represent a good meaning as *social media*, right? > > > > I know there's a way built on some phase dictionary, and try to match the > > phase already there, very like the way to do with chinese language, but > is > > there an open source solution for english, I mean I don't want to build a > > phase dictionary myself, and I also want a smart way, which can > "discover" > > the phase automatically. I got 2 millions docs analyzered the norma way, > > all > > single terms, which I can use as a base source, and it's possible to find > > that *social media *came together frequently, but I really don't know > > what's > > the reverse way. > > > > I tried to find some phase analyzers, but no luck. so any advices? > > > > Regards, > > Andrew > > -- > > Simple is best > > > -- Simple is best