[ 
https://issues.apache.org/jira/browse/SOLR-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-4275:
--------------------------------

    Description: 
When you use the admin interface and select a trie field (e.g. tint) and enter 
nothing into the field, the tokenizer should normally return no tokens. 
TrieTokenizer instead gets and SIOOBE because read() into the charbuffer 
returns -1 (end of stream). This is used to initialize the string's length...

The problem is mostly affecting the analysis request handler and query parsing, 
but while indexing the values, Solr uses NumericField and not the tokenizer 
directly. The solr admin UI has the additional problem that you get a strange 
exception if you fill in the number on the left, but leave the query (right 
empty).

The fix is to modify the tokenizer to behave like a real tokenizer:
- correct the read loop to look like the one from KeywordTokenizer. The current 
loop is not guaranteed to work with unbuffered readers (Solr always uses 
StringReaders so this is no issue, but who knows)
- if the resulting string is empty (total len == 0), set a boolean to false and 
make the incrementToken/close/end methods not delegate and return false.

  was:
When you use the admin interface and select a trie field (e.g. tint) and enter 
nothing into the field, the tokenizer should normally return no tokens. 
TrieTokenizer instead gets and SIOOBE because read() into the charbuffer 
returns -1 (end of stream). This is used to initialize the string's length...

The fix is to modify the tokenizer to behave like a real tokenizer:
- after reading the input, check if empty (read < 0) and then use 0 as length
- if the resulting string is empty (total len == 0), set a boolean to false and 
make the incrementToken/end methods not delegate and return false.

    
> TrieTokenizer causes StringIOOBE when input is empty instead of returning no 
> token
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-4275
>                 URL: https://issues.apache.org/jira/browse/SOLR-4275
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-4275.patch
>
>
> When you use the admin interface and select a trie field (e.g. tint) and 
> enter nothing into the field, the tokenizer should normally return no tokens. 
> TrieTokenizer instead gets and SIOOBE because read() into the charbuffer 
> returns -1 (end of stream). This is used to initialize the string's length...
> The problem is mostly affecting the analysis request handler and query 
> parsing, but while indexing the values, Solr uses NumericField and not the 
> tokenizer directly. The solr admin UI has the additional problem that you get 
> a strange exception if you fill in the number on the left, but leave the 
> query (right empty).
> The fix is to modify the tokenizer to behave like a real tokenizer:
> - correct the read loop to look like the one from KeywordTokenizer. The 
> current loop is not guaranteed to work with unbuffered readers (Solr always 
> uses StringReaders so this is no issue, but who knows)
> - if the resulting string is empty (total len == 0), set a boolean to false 
> and make the incrementToken/close/end methods not delegate and return false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to