Re: [CLucene-dev] Options for href targets as tokens

Ben van Klinken Sat, 31 Oct 2009 03:04:43 -0700

I think what you're talking about can be achived using the analyzer
position increment. Here is the documentation for that function:


        /** Set the position increment.  This determines the position of this
        * token relative to the previous Token in a TokenStream, used in
        * phrase searching.
        *
        * The default value is 1.
        *
        * Some common uses for this are:
        *
        * - Set it to zero to put multiple terms in the same position.  This is
        * useful if, e.g., a word has multiple stems.  Searches for phrases
        * including either stem will match.  In this case, all but the first 
stem's
        * increment should be set to zero: the increment of the first instance
        * should be one.  Repeating a token with an increment of zero can also 
be
        * used to boost the scores of matches on that token.
        *
        * - Set it to values greater than one to inhibit exact phrase matches.
        * If, for example, one does not want phrases to match across removed 
stop
        * words, then one could build a stop word filter that removes stop 
words and
        * also sets the increment to the number of stop words removed before 
each
        * non-stop word.  Then exact phrase queries will only match when the 
terms
        * occur with no intervening stop words.
        */

does that sound right?


ben

2009/10/28 Rob Cuthbertson <[email protected]>:
> I've inherited a CLucene-based system, and there's one idea I'm kicking
> around.  The current system indexes html docs by stripping all tags and
> tokenizing the remaining page text.  I'd like to index link targets, so in
> addition to searching for docs talking about a topic, I can search for docs
> that link to a particular site.
> I can easily add that as metdata to the text, but I lose the ability to do
> proximity searches involving the link target.  I looked at inserting the
> link target into the token stream, but then I break phrase searches that
> span the link.
> What I'd like would be to have the link target stored at the same token
> position as the link text, so one could search for a token near a link
> target, or a phrase that spans a link.
>
> Has anyone else indexed on inline metadata this way?  Anything I should
> beware of as I get going?
>
> --Rob
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> CLucene-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] Options for href targets as tokens

Reply via email to