This is not specific advice, but an idea that I think Google leverages to build up search corrections. If a user searches for "100AW" and it doesn't match, but a moment later they try something different and immediately get to a product page, the system can make a loose connection between their original search and the product they soon thereafter found. Over time, the connections get stronger because others will do the same thing.

I think term vectors could factor into making latent connections somehow also.

Just postulating...

        Erik



On May 21, 2004, at 12:09 PM, David Spencer wrote:

Haven't seen this discussed here.
See 7a at the link below:

http://www.asktog.com/columns/062top10ReasonsToNotShop.html

7a talks about searching on a camera site for the "Lowepro 100 AW".

He says this query works:        "Lowepro 100 AW"
and this query does not work: "Lowepro 100AW"

Cross checking with google indeed shows that the 1st form is much more popular, however the 2nd form is used, and if you're a commerce site or a site that wants to make it easier for users to find things you should help them out.

So the discussion question is what's the best way to handle this.

I guess the somewhat general form of this is that in a query, and term might be split into 2 terms that are individually indexed (so "100AW" is not indexed, but "100" and "AW" is).
In a way the flip side of this is that any 2 terms could be concatenated to form another term that was indexed (so in another universe it might be that passing "100 AW" is not as precise as passing "100AW" but how's the user to know).


In the context of Lucene ways to handle this seem to be:
- automagically run a fuzzy query (so if a query doesn't work, transform "Lowepro 100AW" to "Lowepro~ 100AW~"
- write a query parser that breaks apart unindexed tokens into ones that are indexed (so "100AW" becomes "100 AW")
- write a tokenizer that inserts dummy tokens for every pair of tokens, so the stream "Lowepro 100 AW" would also have "Lowepro100" and "100AW" inserted, presumably via magic w/ TokenStream.next()


Comments on best way to handle this?







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to