[
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195987#comment-13195987
]
Dawid Weiss commented on LUCENE-3725:
-------------------------------------
I had the time to look at the patch, finally. Yes, this is pretty much the
top-n nodes reordering that I did, albeit without any optimization of how many
n to take (the hardcoded magic constants should probably be justified somehow?
Or replaced by a default in FST somewhere?). Deciding how many nodes to reorder
is I think hard -- I failed to provide any sensible fast heuristic for that and
I simply do a simulated annealing to find a local optimum.
One thing I was wondering is why you decided to integrate the packer with the
fst -- wouldn't it be cleaner to separate packing from construction? Granted,
this requires a double pass over the fst nodes and more intermediate memory but
it wouldn't add any more complexity to the builder which is already, ehm, a bit
complex ;). You can compare this design in Morfologik:
Builder:
http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-fsa/src/main/java/morfologik/fsa/FSABuilder.java?revision=343&view=markup
Serialization (optimized or not, takes ant FSA on input) (method #linearize):
http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-fsa/src/main/java/morfologik/fsa/CFSA2Serializer.java?revision=343&view=markup
I any way, the patch looks good to me.
> Add optional packing to FST building
> ------------------------------------
>
> Key: LUCENE-3725
> URL: https://issues.apache.org/jira/browse/LUCENE-3725
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/FSTs
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch,
> Perf.java
>
>
> The FSTs produced by Builder can be further shrunk if you are willing
> to spend highish transient RAM to do so... our Builder today tries
> hard not to use much RAM (and has options to tweak down the RAM usage,
> in exchange for somewhat lager FST), even when building immense FSTs.
> But for apps that can afford highish transient RAM to get a smaller
> net FST, I think we should offer packing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]