[
https://issues.apache.org/jira/browse/LUCENE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll resolved LUCENE-417.
------------------------------------
Resolution: Incomplete
Assignee: (was: Lucene Developers)
No patch, no tests, this one has languished for a while. Please open again
if/when tests are available.
> StandardTokenizer has problems with comma-separated values
> ----------------------------------------------------------
>
> Key: LUCENE-417
> URL: https://issues.apache.org/jira/browse/LUCENE-417
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.4
> Environment: Operating System: other
> Platform: Other
> Reporter: André Wolf
> Priority: Minor
>
> The StandardTokenizer assumes that if a phrase contains a comma and at least
> one
> digit, the phrase has to be a number. We are trying to index comma-separated
> values of SAP R/3 trancation codes along with standard text. Many of these
> code
> contain digits, e.g. "VA01" or "SE80". While tokenizing text containing these
> codes, lucene recognizes a comma-separated list of them as a digit, e.g.
> "VA01,VA02,VA03". The grammar should be modified to recognize numbers
> correctly
> (e.g. containing only digits).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]