I am building an FST. Here is an excerpt from my code;
//build the FST from the workingSet
Builder<IntsRef> builder = new Builder<IntsRef>(FST.INPUT_TYPE.BYTE4,
outputs);
IntsRef sortedKeys[] = workingSet.keySet().toArray(new
IntsRef[workingSet.size()]);
Arrays.sort(sortedKeys);
int maxPhraseLen = 0;
int maxDocsLen = 0;
for (IntsRef termIdsPhrase : sortedKeys) {
IntsRef solrIds = workingSet.remove(termIdsPhrase);//remove to save memory
assert termIdsPhrase.length > 0 && solrIds.length > 0;
builder.add(termIdsPhrase, solrIds);
}
return builder.finish();
For what it's worth, the input side is maximum 7 integers long, and the output
side is typically the same but there are a small number that get as high as 48K
integers long. There are 10M entries.
After many calls to builder.add(), and with assertions enabled, I eventually
this exception:
Exception in thread "main" java.lang.AssertionError: size must be positive (got
-262796219): likely integer overflow?
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:336)
at org.apache.lucene.util.fst.FST.addNode(FST.java:672)
at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:122)
at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:195)
at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:287)
at org.apache.lucene.util.fst.Builder.add(Builder.java:392)
at
org.mitre.opensextant.solr.TaggerFstCorpus.buildPhrases(TaggerFstCorpus.java:176)
at
org.mitre.opensextant.solr.TaggerFstCorpus.doBuild(TaggerFstCorpus.java:61)
at
org.mitre.opensextant.solr.BuildCorpusExperiment.main(BuildCorpusExperiment.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
This is on Lucene 4.0-ALPHA using JDK 7. I'm using 6GB of heap; my attempts to
use less resulted in Out-of-memory errors. What FST size limitation am I
bumping up against?
~ David
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]