[ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2181:
--------------------------------

    Attachment: LUCENE-2181.patch

ok i think we might be close to something committable now:
* wrote tests for NewLocaleTask and NewCollationAnalyzerTask
* set doc.stored=false, doc.tokenized=false, doc.body.tokenized=true in the 
collation.alg file
* i moved the two scripts into a 'scripts' directory, i thought this made more 
sense? 
* I also renamed the bm2jira.pl script to collation.bm2jira.pl

here is the output from 'ant collation' from the benchmark package:

||Language||java.text||ICU4J||KeywordAnalyzer||ICU4J Improvement||
|English|10.78s|7.32s|1.58s|60%|
|French|11.48s|7.52s|1.59s|67%|
|German|11.19s|7.52s|1.61s|62%|
|Ukrainian|13.03s|8.68s|1.66s|62%|

i think its more accurate relative to KeywordAnalyzer now that we aren't 
storing the body text in a stored field and things like that, but of course you 
can change the .alg file to see if the differences matter in the context of 
overall indexing by turning these back on.


> benchmark for collation
> -----------------------
>
>                 Key: LUCENE-2181
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2181
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
> top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2
>
>
> Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
> jdk and icu) under LUCENE-2084, along with some instructions to run it... 
> I think it would be a nice if we could turn this into a committable patch and 
> add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to