[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521361 ]
Dawid Weiss commented on LUCENE-871: ------------------------------------ I was a bit curious about it, so I decided to write a table-lookup version. It does come out faster, but only by a small margin (especially on "server", hotspot JVMs). Timings are in milliseconds, the round consisted of 100000 repetitions of parsing the test string "Des mot clés À LA CHAÎNE À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü Ý Ÿ à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø œ ß þ ù ú û ü ý ÿ". Note it is biased since most characters do have accents, which will not be the case in real life I gues... but still: // SUN JVM build 1.6.0-b105, -server mode Round (old): 1922 Round (old): 1688 Round (old): 1656 Round (old): 1687 Round (old): 1641 Round (old): 1703 Round (old): 1672 Round (old): 1672 Round (old): 1687 Round (old): 1719 Round (new): 1719 Round (new): 1609 Round (new): 1609 Round (new): 1594 Round (new): 1625 Round (new): 1578 Round (new): 1625 Round (new): 1594 Round (new): 1625 Round (new): 1656 // SUN JVM, 1.6.0, interpreted (-client) Round (old): 2391 Round (old): 2453 Round (old): 2359 Round (old): 2375 Round (old): 2391 Round (old): 2359 Round (old): 2156 Round (old): 2532 Round (old): 2422 Round (old): 2359 Round (new): 1969 Round (new): 1906 Round (new): 1922 Round (new): 1937 Round (new): 1985 Round (new): 1922 Round (new): 1906 Round (new): 1937 Round (new): 1985 Round (new): 1922 // IBM JVM 1.5.0 (don't know why it's so sluggish, really). Round (old): 7906 Round (old): 7188 Round (old): 7625 Round (old): 7312 Round (old): 7266 Round (old): 7141 Round (old): 7015 Round (old): 5641 Round (old): 5578 Round (old): 5672 Round (new): 4656 Round (new): 4406 Round (new): 4516 Round (new): 4516 Round (new): 4375 Round (new): 4375 Round (new): 4343 Round (new): 4297 Round (new): 4344 Round (new): 4266 // IBM 1.5.0, -server (note the speed improvement when the old version is hotspot-optimized). Round (old): 5922 Round (old): 5078 Round (old): 5078 Round (old): 5062 Round (old): 4985 Round (old): 4875 Round (old): 4953 Round (old): 4641 Round (old): 3640 Round (old): 3735 Round (new): 3750 Round (new): 3781 Round (new): 3656 Round (new): 3516 Round (new): 3515 Round (new): 3594 Round (new): 3547 Round (new): 3562 Round (new): 3532 Round (new): 3531 So... it does come out a bit faster. Whether it makes sense to waste 130 kb of memory for this improvement.... don't know, really. I'll upload the table-lookup version for your reference. > ISOLatin1AccentFilter a bit slow > -------------------------------- > > Key: LUCENE-871 > URL: https://issues.apache.org/jira/browse/LUCENE-871 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2 > Reporter: Ian Boston > Assignee: Michael McCandless > Fix For: 2.3 > > Attachments: fasterisoremove1.patch, fasterisoremove2.patch, > ISOLatin1AccentFilter.java.patch, LUCENE-871.take4.patch > > > The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in > a highligher for output responses. > Patch to follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]