> This KWIC concept sounds cool. Any place I could find more info?
>
I got curious, so I googled for KWIC -- one thing lead to another.
Here is an overall demonstration of the concept:
http://coronet.iicm.edu/wbtmaster/courses/kwic_intro_start9.htm
Hit the right arrow (upper right corner) to start the demo
1) I forgot that the early KWIC indexes published at IBM were done
before widespread use of hard disks (a 10 meg drive cost about $13,000
per month) -- so KWIC (or any) indexing the full text of documents was
impractical -- instead they prepared KWIC indexes of document titles.
2) Kwic indexing was developed by Hans Peter Lunn -- who went to work
for IBM
http://web.utk.edu/~jgantt/hanspeterluhn.html
3) As the computing industry advanced, a more general form of a KWIC
index, called a concordance, became something of a CS Class exercise.
http://www.cs.wm.edu/~noonan/cs312/homework/concordance/
4) Further advances made it practical to provide KWICK/Concordance
indexing of the full text of documents
http://www.georgetown.edu/faculty/ballc/corpora/tutorial3.html
5) Today, several institutions, including Stanford University and
Amazon.com use KWIC indexing to augment Full text searches. What
appears to happen is this:
a) A keyword search is performed title, author, bio, as well as the
content of the documents (using boolean logic, stemming, synonyms,
whatever)
b) For any hits found in the full-text content are extracted along with
a given amount of leading and training words. A quick index is then
dynamically generated on the extracted lines.
c) the extracted text snippets are presented with the keywords
highlighted (bold color) as a more detailed subindex of the particular
document.
http://www.infotoday.com/newsbreaks/nb031103-1.shtml
http://highwire.stanford.edu/inthepress/asbmb/asbmb_2003feb.dtl
First of all, this really isn't a KWIC index -- all it is is a text
snippet with the hit words highlighted.
Second the "KWIC" index only appears if the keywords do not all appear
in the title/author/bio
Third the "KWIC" index is subordinate to, and relative to a single
document -- you do not get the advantage of seeing the results of all
the documents "In Context"
What I think would be much more useful would a composite KWIC index of
all the hits (with a link to
the doc).
Apple's search technology (kind of) uses KWIC-type indexing in iTunes
-- they just don't rearrange the text nor highlight the hit words --
the just display the text "as-is".
Based on my experience of finding things with KWIC, I think modern
search techniques are missing something by not fully exploiting the
KWIC way of presenting results -- it is ugly, but a human can very
quickly scan the (KWIC Formatted) context of all the matches to find
what he seeks.
HTH
Dick
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

