Has anyone developed code to extract SIPs (statistically improbable
phrases) and CAPs (capitalized phrases) from a Lucene index, such as
Amazon does with it's books as shown here?
<http://www.amazon.com/exec/obidos/tg/detail/-/0764526413/
ref=sip_top_dp/102-8573693-0514548?%5Fencoding=UTF8&v=glance>
I'm curious as it is something I'd like to do with some of my work.
Of course CAPs would be impossible to extract from an index that used
a lowercasing analyzer, so that is a special case that would require
work during indexing. But SIPs could be extracted from an existing
index.
Thanks,
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]