[ 
https://issues.apache.org/jira/browse/LUCENE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761907#action_12761907
 ] 

Patrick Jungermann commented on LUCENE-1939:
--------------------------------------------

Karl, your right, sorry. I used the current release of Solr, version 1.3.0, 
that's using Lucene 2.4.1. Solr 1.4 that will be released soon is using Lucene 
2.9. For me, it seems that filter did not changed at the causing code lines. 
But I don't know, if this is the real root cause.

Now, I have tested this also with the current trunk of Solr that is already 
using Lucene 2.9. At first I tried a simple example with an analyzing workflow 
based on the WhitespaceTokenizer followed by the ShingleMatrixFilter and no 
problem occured.

Then, I tried the other field type configuration, that I had used at the former 
test and the exception was thrown.

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.RangeCheck(Unknown Source)
        at java.util.ArrayList.get(Unknown Source)
        at 
org.apache.lucene.analysis.shingle.ShingleMatrixFilter$Matrix$1.hasNext(ShingleMatrixFilter.java:841)
        at 
org.apache.lucene.analysis.shingle.ShingleMatrixFilter.produceNextToken(ShingleMatrixFilter.java:485)
        at 
org.apache.lucene.analysis.shingle.ShingleMatrixFilter.incrementToken(ShingleMatrixFilter.java:372)
        at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:401)
        at 
org.apache.lucene.analysis.shingle.ShingleMatrixFilter.next(ShingleMatrixFilter.java:405)
        ...
{code}

To find the reason of it, I removed filter by filter. After a lot of tests, I 
found out that the problem was caused by the use of
# WhitespaceTokenizer
# ShingleMatrixFilter
# RemoveDuplicatesTokenFilter
that were used in that order. If I changed the positions of both filters, all 
seems to work okay.

This time, I tested this only with the field analysis view with different data

Also, it was really strange, that the exception only occured at the first 
analysis request, and extremely rarly a second time. But it was thrown at every 
first request.

> IndexOutOfBoundsException at ShingleMatrixFilter's Iterator#hasNext method
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-1939
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1939
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Patrick Jungermann
>            Assignee: Karl Wettin
>         Attachments: ShingleMatrixFilter_IndexOutOfBoundsException.patch
>
>
> I tried to use the ShingleMatrixFilter within Solr. To test the functionality 
> etc., I first used the built-in field analysis view.The filter was configured 
> to be used only at query time analysis with "_" as spacer character and a 
> min. and max. shingle size of 2. The generation of the shingles for query 
> strings with this filter seems to work at this view, but by turn on the 
> highlighting of indexed terms that will match the query terms, the exception 
> was thrown. Also, each time I tried to query the index the exception was 
> immediately thrown.
> Stacktrace:
> {code}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>       at java.util.ArrayList.RangeCheck(Unknown Source)
>       at java.util.ArrayList.get(Unknown Source)
>       at 
> org.apache.lucene.analysis.shingle.ShingleMatrixFilter$Matrix$1.hasNext(ShingleMatrixFilter.java:729)
>       at 
> org.apache.lucene.analysis.shingle.ShingleMatrixFilter.next(ShingleMatrixFilter.java:380)
>       at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:120)
>       at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:47)
>       ...
> {code}
> Within the hasNext method, there is the {{s-1}}-th Column from the ArrayList 
> {{columns}} requested, but there isn't this entry within columns.
> I created a patch that checks, if {{columns}} contains enough entries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to