Re: index enforcing query terms to appear within the same sentence

Ian Lea Fri, 11 Mar 2011 02:04:57 -0800

The example code in
http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html
reads


custom standard analyzer:

public class MyStandardAnalyzer extends StandardAnalyzer implements
IndexFields {
        public MyStandardAnalyzer(Version matchVersion) {
                super(matchVersion);
        }
        public int getPositionIncrementGap(String fieldName) {
                int incrementGap = super.getPositionIncrementGap(fieldName);
                if (fieldName.equals(IFIELD_TEXT)) {
                        incrementGap += 10;
                }
                return incrementGap;
        }
}

so if you used this analyzer and called

new Field(IFIELD_TEXT, value, ...) and

new Field("someothername", value, ...) the first field would get the
modified gaps and the second one wouldn't.


Hope that helps.


--
Ian.

On Thu, Mar 10, 2011 at 4:34 PM, Michael Wiegand
<michael.wieg...@lsv.uni-saarland.de> wrote:
> Conceptually, I think I know what to do. Unfortunately, with the given
> interfaces of Lucene I have some difficulty.
>
> If I add the content of a document sentence by sentence, i.e. line by line,
> (using a multi-valued field), there are only two constructors possible:
> Field(String name, String value, Field.Store store, Field.Index index)
> or
> Field(String name, String value, Field.Store store, Field.Index index,
> Field.TermVector termVector)
> The sentence comes as a string which I get from a BufferedReader-object by
> using the readLine() method.
>
> But as far as I understood, I need to access some TokenStream-object in
> order to set the PositionIncrementAttribute. So how should that work?
>
> Thank you in advance.
>
> Ian Lea schrieb:
>>>
>>> You can use multi valued fields if you play with the position
>>> increment gap.  See e.g.
>>>
>>> http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html
>>>
>>> A google search for "lucene indexing sentences" or similar finds that,
>>> and more.
>>>
>>>
>>> Different docs can have different fields/different numbers of fields,
>>> but the position gap approach is probably better.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Fri, Mar 4, 2011 at 7:06 AM, Michael Wiegand
>>> <michael.wieg...@lsv.uni-saarland.de> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I would like to create an index with Lucene to a document collections of
>>>> text files.
>>>> The index should be created in such a way, that for the search I can
>>>> enforce
>>>> that query term A and query term B are contained within the same
>>>> sentence.
>>>>
>>>> How should implement the index? Should I have for every sentence a
>>>> different
>>>> field (but make sure that it is not a multi-valued field because they
>>>> would
>>>> get merged which is exactly what I do not want)?
>>>> Would it be problematic that different documents would then end up
>>>> having
>>>> different numbes of fields?
>>>>
>>>> Thank you in advance!
>>>>
>>>> Best,
>>>> Michael
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: index enforcing query terms to appear within the same sentence

Reply via email to