I've inherited a CLucene-based system, and there's one idea I'm kicking
around.  The current system indexes html docs by stripping all tags and
tokenizing the remaining page text.  I'd like to index link targets, so in
addition to searching for docs talking about a topic, I can search for docs
that link to a particular site.
I can easily add that as metdata to the text, but I lose the ability to do
proximity searches involving the link target.  I looked at inserting the
link target into the token stream, but then I break phrase searches that
span the link.
What I'd like would be to have the link target stored at the same token
position as the link text, so one could search for a token near a link
target, or a phrase that spans a link.

Has anyone else indexed on inline metadata this way?  Anything I should
beware of as I get going?

--Rob
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to