[ 
https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805909#action_12805909
 ] 

Simon Willnauer edited comment on LUCENE-2183 at 1/28/10 1:16 PM:
------------------------------------------------------------------

I did run following benchmark alg file against the latest patch (specialized 
old and new methods), the patch with the proxy methods and the old 3.0 code. 
The outcome shows that the specialized code is about ~8% faster than the proxy 
class based code so I would rather keep the specialized code as this class is 
performance sensitive though

.alg file
{code}
analyzer=org.apache.lucene.analysis.WhitespaceAnalyzer
content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
content.source.forever=false
{ "Rounds" { "ReadTokens" ReadTokens > : *  NewRound ResetSystemErase} : 10
RepAll
{code}

10 Rounds with the latest patch
{code}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10              0        1            0         0.00       
14.83     5,049,432     66,453,504
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.07 -  34,558,000 -   55,705,600
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.40    41,865,312     60,555,264
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.22 -  34,393,904 -   63,176,704
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.24    15,440,624     64,487,424
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.22 -   7,540,512 -   65,601,536
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.21    50,174,760     67,239,936
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.19 -  22,202,768 -   67,174,400
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.19    20,591,672     68,812,800
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.18 -  63,749,984 -   69,009,408
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.19    22,331,600     68,943,872
{code}

10 rounds with Proxy Class
{code}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10              0        1            0         0.00       
16.33     5,021,144     67,436,544
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.34 -  44,649,496 -   59,244,544
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.53    36,681,952     61,472,768
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.37 -  13,863,688 -   64,094,208
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.34    50,247,864     65,470,464
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.36 -  14,922,888 -   66,322,432
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.36     5,718,296     67,371,008
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.32 -  54,583,776 -   67,502,080
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.33    35,739,800     68,943,872
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.32 -  24,985,688 -   69,861,376
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.29    64,138,112     69,730,304
{code}

10 rounds with current trunk
{code}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10                   0        1            0         0.00    
   15.19     5,040,928     66,256,896
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.15 -  39,548,440 -   55,443,456
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.43    28,088,544     60,096,512
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.27 -  16,004,088 -   61,800,448
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.25    51,034,016     63,045,632
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.24 -  23,371,056 -   63,504,384
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.24    12,964,368     65,208,320
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.25 -   6,598,128 -   65,601,536
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.23    50,932,464     67,239,936
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.24 -  20,433,136 -   67,305,472
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.23    63,638,552     68,812,800

{code}

      was (Author: simonw):
    I did run following benchmark alg file against the latest patch 
(specialized old and new methods), the patch with the proxy methods and the old 
3.0 code. The outcome shows that the specialized code is about ~8% faster than 
the proxy class based code so I would rather keep the specialized code as this 
class is performance sensitive though

.alg file
{quote}
analyzer=org.apache.lucene.analysis.WhitespaceAnalyzer
content.source=org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource
content.source.forever=false
{ "Rounds" { "ReadTokens" ReadTokens > : *  NewRound ResetSystemErase} : 10
RepAll
{quote}

10 Rounds with the latest patch
{quote}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10              0        1            0         0.00       
14.83     5,049,432     66,453,504
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.07 -  34,558,000 -   55,705,600
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.40    41,865,312     60,555,264
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.22 -  34,393,904 -   63,176,704
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.24    15,440,624     64,487,424
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.22 -   7,540,512 -   65,601,536
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.21    50,174,760     67,239,936
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.19 -  22,202,768 -   67,174,400
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.19    20,591,672     68,812,800
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.18 -  63,749,984 -   69,009,408
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.19    22,331,600     68,943,872
{quote}

10 rounds with Proxy Class
{quote}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10              0        1            0         0.00       
16.33     5,021,144     67,436,544
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.34 -  44,649,496 -   59,244,544
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.53    36,681,952     61,472,768
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.37 -  13,863,688 -   64,094,208
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.34    50,247,864     65,470,464
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.36 -  14,922,888 -   66,322,432
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.36     5,718,296     67,371,008
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.32 -  54,583,776 -   67,502,080
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.33    35,739,800     68,943,872
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.32 -  24,985,688 -   69,861,376
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.29    64,138,112     69,730,304
{quote}

10 rounds with current trunk
{quote}
     [java] ------------> Report All (11 out of 12)
     [java] Operation          round   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] Rounds_10                   0        1            0         0.00    
   15.19     5,040,928     66,256,896
     [java] ReadTokens_Exhaust -   0 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
2.15 -  39,548,440 -   55,443,456
     [java] ReadTokens_Exhaust     1        1            0         0.00        
1.43    28,088,544     60,096,512
     [java] ReadTokens_Exhaust -   2 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.27 -  16,004,088 -   61,800,448
     [java] ReadTokens_Exhaust     3        1            0         0.00        
1.25    51,034,016     63,045,632
     [java] ReadTokens_Exhaust -   4 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.24 -  23,371,056 -   63,504,384
     [java] ReadTokens_Exhaust     5        1            0         0.00        
1.24    12,964,368     65,208,320
     [java] ReadTokens_Exhaust -   6 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.25 -   6,598,128 -   65,601,536
     [java] ReadTokens_Exhaust     7        1            0         0.00        
1.23    50,932,464     67,239,936
     [java] ReadTokens_Exhaust -   8 -  -   1 -  -  -  - 0 -  -  - 0.00 -  -   
1.24 -  20,433,136 -   67,305,472
     [java] ReadTokens_Exhaust     9        1            0         0.00        
1.23    63,638,552     68,812,800

{quote}
  
> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>
>                 Key: LUCENE-2183
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2183
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: LUCENE-2183.patch, LUCENE-2183.patch, LUCENE-2183.patch, 
> LUCENE-2183.patch, LUCENE-2183.patch
>
>
> CharTokenizer is an abstract base class for all Tokenizers operating on a 
> character level. Yet, those tokenizers still use char primitives instead of 
> int codepoints. CharTokenizer should operate on codepoints and preserve bw 
> compatibility. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to