[ 
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3725:
---------------------------------------

    Attachment: LUCENE-3725.patch

OK I turned packing back on for Kuromoji's TokenInfoDict... this
reduces size from 1954846 to 1498215 bytes (23% smaller = 445.9 KB).

And.... now JAR is a bit smaller: 4533833 vs 4570053 bytes (~35
KB).  What changed was... I tweaked the params to pack (there are 2
int params) and got better packing than before.

I wrote a simple perf test (Perf.java attached)... and as far as I can
tell the perf change is within hotspot noise... with trunk I get this:
{noformat}
1366 msec; 688.1405563689605 tokens/msec
1006 msec; 934.3936381709741 tokens/msec
1020 msec; 921.5686274509804 tokens/msec
938 msec; 1002.1321961620469 tokens/msec
937 msec; 1003.2017075773746 tokens/msec
942 msec; 997.8768577494692 tokens/msec
938 msec; 1002.1321961620469 tokens/msec
940 msec; 1000.0 tokens/msec
939 msec; 1001.0649627263045 tokens/msec
939 msec; 1001.0649627263045 tokens/msec
{noformat}

And with the packed FST I get this:

{noformat}
1366 msec; 688.1405563689605 tokens/msec
1003 msec; 937.1884346959123 tokens/msec
1014 msec; 927.0216962524655 tokens/msec
934 msec; 1006.423982869379 tokens/msec
935 msec; 1005.3475935828877 tokens/msec
936 msec; 1004.2735042735043 tokens/msec
935 msec; 1005.3475935828877 tokens/msec
938 msec; 1002.1321961620469 tokens/msec
936 msec; 1004.2735042735043 tokens/msec
935 msec; 1005.3475935828877 tokens/msec
{noformat}

But (annoyingly, as usual!) the results can differ quite a bit
depending on how hotspot flips coins on startup...

                
> Add optional packing to FST building
> ------------------------------------
>
>                 Key: LUCENE-3725
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3725
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, 
> Perf.java
>
>
> The FSTs produced by Builder can be further shrunk if you are willing
> to spend highish transient RAM to do so... our Builder today tries
> hard not to use much RAM (and has options to tweak down the RAM usage,
> in exchange for somewhat lager FST), even when building immense FSTs.
> But for apps that can afford highish transient RAM to get a smaller
> net FST, I think we should offer packing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to