[ 
https://issues.apache.org/jira/browse/LUCENE-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679362#comment-13679362
 ] 

Shai Erera commented on LUCENE-5048:
------------------------------------

Good catch. The test doesn't always reproduce though, it repro with this seed: 
{{-Dtests.seed=5229F4C07527089D}}.
I modified it to fail on even less categories, just had to initialize a 
Cl2oTaxoWriterCache with smaller initial capacity.

Anyway, the problem is the cast to short. In Java, short (and int) are signed. 
So what happens is:
* The test generates a random unicode string of length 32767, but its length() 
is 65534 which is 0xFFFE.
* Cast that to short, you get -2.
* Cast it to char, you get 0xFFFE (since char is the only primitive that's 
unsigned)
* And of course casting to int is unnecessary.

The code serializes the length of each component in a char[], as a char. During 
serialization, it casts to char, since it appends it to a char[]. During 
deserialization, it wrongly casts to short (should cast to char, which you then 
get a warning on unnecessary cast). I guess this never showed up since nobody 
yet tried to index components of length 16K+ :).

So to fix this we indeed need to remove the cast to short. But also, there's a 
bigger bug here -- since the length is assumed to be less than 65536, if you 
index a very large category twice (or a substring of it), I think there will be 
an issue. So I added to test adding same category again, after re-hash, and 
voila, it was added again, with a different ordinal. Therefore I added a check 
in CategoryPath which prevents the creation of CPs with very large components 
(> 65535 chars). I'll post a modified patch soon.
                
> NegativeArraySizeException caused by  ff.addFields
> --------------------------------------------------
>
>                 Key: LUCENE-5048
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5048
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 4.2, 4.3
>         Environment: Not sure if this is applicable, but it's Gentoo Linux 
> with java 1.7 on a dual intel xeon lga2011 with supermicro server motherboard 
> and 256GB of ram
>            Reporter: Colton Jamieson
>         Attachments: LUCENE-5048.patch
>
>
> I have a Server/Client software that I have created which has a server 
> process that accepts connections from clients that transmit data about local 
> connection information. This data is than buffered and a ThreadPoolExecutor 
> runs to take the data and put it into a lucene index as well as a facet 
> index. This works perfect for the lucene index, but the facet index randomly 
> generates a NegativeArraySizeException. I cannot find any reason why the 
> exception would be caused because lines with the same type of data do not 
> throw it, then all of a sudden the exception is thrown, typically 4 of them 
> in a row. I talked with mikemccand on IRC and he requested I submit this 
> issue.
> After some discussion, he seems to think it's because some of the values I am 
> using are rather large.
> Here is the exception...
> java.lang.NegativeArraySizeException
>         at 
> java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
>         at java.lang.StringBuilder.<init>(StringBuilder.java:97)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.CharBlockArray.subSequence(CharBlockArray.java:164)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.CategoryPathUtils.hashCodeOfSerialized(CategoryPathUtils.java:50)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal.stringHashCode(CompactLabelToOrdinal.java:294)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal.grow(CompactLabelToOrdinal.java:184)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.CompactLabelToOrdinal.addLabel(CompactLabelToOrdinal.java:116)
>         at 
> org.apache.lucene.facet.taxonomy.writercache.cl2o.Cl2oTaxonomyWriterCache.put(Cl2oTaxonomyWriterCache.java:84)
>         at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.addToCache(DirectoryTaxonomyWriter.java:592)
>         at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.addCategoryDocument(DirectoryTaxonomyWriter.java:551)
>         at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.internalAddCategory(DirectoryTaxonomyWriter.java:501)
>         at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.internalAddCategory(DirectoryTaxonomyWriter.java:494)
>         at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.addCategory(DirectoryTaxonomyWriter.java:468)
>         at 
> org.apache.lucene.facet.index.FacetFields.addFields(FacetFields.java:175)
>         at net.domain.NetstatIndexer.IndexJob.run(IndexJob.java:73)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> Here is an example data entry which appears when the exception occurs...
> Location: nj
> LocalIP: 10.1.200.187
> RemoteIP: 41.161.197.166
> LocalPorts: [443]
> Connections: 1
> Times: [120]
> Timestamp: 2013-06-09T12:51:00.000-07:00
> States: ["Established"]
> And here is the the code stripped down to provide an example of how I am 
> handling the facet/doc code.
>  
> doc.add(new TextField("Location", ehost[0], Field.Store.YES));
> cats.add(new CategoryPath("Location", doc.get("Location")));
> doc.add(new TextField("LocalIP", (String) stat.get("LocalIP"), 
> Field.Store.YES));
> cats.add(new CategoryPath("LocalIP", doc.get("LocalIP")));
> doc.add(new TextField("RemoteIP", (String) stat.get("RemoteIP"), 
> Field.Store.YES));
> cats.add(new CategoryPath("RemoteIP", doc.get("RemoteIP")));
> doc.add(new TextField("LocalPorts", StringUtils.join(stat.get("LocalPorts"), 
> ","), Field.Store.YES));
> cats.add(new CategoryPath("LocalPorts", doc.get("LocalPorts")));
> doc.add(new TextField("RemotePorts", 
> StringUtils.join(stat.get("RemotePorts"), ","), Field.Store.YES));
> cats.add(new CategoryPath("RemotePorts", doc.get("RemotePorts")));
> doc.add(new LongField("Connections", (Long) stat.get("Connections"), 
> Field.Store.YES));
> cats.add(new CategoryPath("Connections", doc.get("Connections")));
> doc.add(new TextField("Times", StringUtils.join(stat.get("Times"), ","), 
> Field.Store.YES));
> cats.add(new CategoryPath("Times", doc.get("Times")));
> doc.add(new TextField("Timestamp", (String) stat.get("Timestamp"), 
> Field.Store.YES));
> cats.add(new CategoryPath("Timestamp", doc.get("Timestamp")));
> doc.add(new TextField("States", StringUtils.join(stat.get("States"), ","), 
> Field.Store.YES));
> cats.add(new CategoryPath("States", doc.get("States")));
> System.out.println("Location: "+doc.get("Location")+" LocalIP: 
> "+doc.get("LocalIP")+" RemoteIP: "+doc.get("RemoteIP")+" LocalPorts: 
> "+doc.get("LocalPorts")+" Connections: "+doc.get("Connections")+" Times: 
> "+doc.get("Times")+" Timestamp: "+doc.get("Timestamp")+" States: 
> "+doc.get("States"));
> if (cats.size()!=0) {
>         FacetFields ff = new FacetFields(Main.twriter);
>         ff.addFields(doc, cats); // <-- Exception occurs here randomly
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to