[ 
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195987#comment-13195987
 ] 

Dawid Weiss commented on LUCENE-3725:
-------------------------------------

I had the time to look at the patch, finally. Yes, this is pretty much the 
top-n nodes reordering that I did, albeit without any optimization of how many 
n to take (the hardcoded magic constants should probably be justified somehow? 
Or replaced by a default in FST somewhere?). Deciding how many nodes to reorder 
is I think hard -- I failed to provide any sensible fast heuristic for that and 
I simply do a simulated annealing to find a local optimum.

One thing I was wondering is why you decided to integrate the packer with the 
fst -- wouldn't it be cleaner to separate packing from construction? Granted, 
this requires a double pass over the fst nodes and more intermediate memory but 
it wouldn't add any more complexity to the builder which is already, ehm, a bit 
complex ;). You can compare this design in Morfologik:

Builder:
http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-fsa/src/main/java/morfologik/fsa/FSABuilder.java?revision=343&view=markup

Serialization (optimized or not, takes ant FSA on input) (method #linearize):
http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-fsa/src/main/java/morfologik/fsa/CFSA2Serializer.java?revision=343&view=markup

I any way, the patch looks good to me.
                
> Add optional packing to FST building
> ------------------------------------
>
>                 Key: LUCENE-3725
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3725
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, 
> Perf.java
>
>
> The FSTs produced by Builder can be further shrunk if you are willing
> to spend highish transient RAM to do so... our Builder today tries
> hard not to use much RAM (and has options to tweak down the RAM usage,
> in exchange for somewhat lager FST), even when building immense FSTs.
> But for apps that can afford highish transient RAM to get a smaller
> net FST, I think we should offer packing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to