Has anyone developed code to extract SIPs (statistically improbable phrases) and CAPs (capitalized phrases) from a Lucene index, such as Amazon does with it's books as shown here?

<http://www.amazon.com/exec/obidos/tg/detail/-/0764526413/ ref=sip_top_dp/102-8573693-0514548?%5Fencoding=UTF8&v=glance>

I'm curious as it is something I'd like to do with some of my work. Of course CAPs would be impossible to extract from an index that used a lowercasing analyzer, so that is a special case that would require work during indexing. But SIPs could be extracted from an existing index.

Thanks,
    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to