[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

Grant Ingersoll (JIRA) Fri, 12 Jun 2009 12:09:29 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718943#action_12718943
 ]


Grant Ingersoll commented on LUCENE-1676:
-----------------------------------------

I grabbed Apache Harmony's Integer.parseInt() code and converted it to take in 
a char array, which should speed up the IntegerEncoder.  However, the 
Float.parseInt implementation relies on some constructs that are not available 
in JDK 1.4, so that one is going to have to stay as it is.

The main problem lies in the reliance on the HexStringParser 
(https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/org/apache/harmony/luni/util/HexStringParser.java)
 which is in need of some Long specific attributes that are either >JDK1.4 or 
are Harmony specific attributes of Long (I didn't take the time to investigate)

At any rate, I added the Integer stuff to ArrayUtils and also added some tests.

For reference, see: 
https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/org/apache/harmony/luni/util/FloatingPointParser.java

https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/java/lang/Integer.java



> New Token filter for adding payloads "in-stream"
> ------------------------------------------------
>
>                 Key: LUCENE-1676
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1676
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1676.patch
>
>
> This TokenFilter is able to split a token based on a delimiter and use one 
> part as the token and the other part as a payload.  This allows someone to 
> include payloads inline with tokens (presumably setup by a pipeline ahead of 
> time).  An example is apropos.  Given a | delimiter, we could have a stream 
> that looks like:
> {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ 
> dogs|NN{quote}
> In this case, this would produce tokens and payloads (assuming whitespace 
> tokenization):
> Token: the
> Payload: null
> Token: quick
> Payload: JJ
> Token: red
> Pay: JJ.
> and so on.
> This patch will also support pluggable encoders for the payloads, so it can 
> convert from the character array to byte arrays as appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"

Reply via email to