Using StandardAnalyzer. It's probably the payload field? This is the code that creates the payload field:
private static class SinglePayloadTokenStream extends TokenStream { private Token token = new Token(UID_TERM.text(), 0, 0); private byte[] buffer = new byte[4]; private boolean returnToken = false; void setUID(int uid) { buffer[0] = (byte) (uid); buffer[1] = (byte) (uid >> 8); buffer[2] = (byte) (uid >> 16); buffer[3] = (byte) (uid >> 24); token.setPayload(new Payload(buffer)); returnToken = true; } public Token next() throws IOException { if (returnToken) { returnToken = false; return token; } else { return null; } } } public static void fillDocumentID(Document doc,int id) { SinglePayloadTokenStream singlePayloadTokenStream = new SinglePayloadTokenStream(); singlePayloadTokenStream.setUID(id); Field f=doc.getField(UID_TERM.field()); if (f==null) { f=new Field(UID_TERM.field(), singlePayloadTokenStream); doc.add(f); } else{ f.setValue(singlePayloadTokenStream); } f=null; f=doc.getField(Indexable.DOCUMENT_ID_FIELD); if (f==null) { f=new Field(Indexable.DOCUMENT_ID_FIELD,String.valueOf(id),Store.NO,Index.NOT_ANALYZED); doc.add(f); } else { f.setValue(String.valueOf(id)); } } On Tue, Mar 24, 2009 at 12:36 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I was just able to index all of wikipedia, using StandardAnalyzer, > with assertions enabled, without hitting that exception. Which > analyzer are you using (besides your payload field)? > > Mike > > Michael McCandless <luc...@mikemccandless.com> wrote: > > Hmmmm. > > > > Jason is this easily/compactly repeated? EG, try to index the N docs > > before that one. > > > > If you remove the SinglePayloadTokenStream field, does the exception > > still happen? > > > > Mike > > > > Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > >> While indexing using > >> contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The > >> asserion error is from TermsHashPerField.comparePostings(RawPostingList > p1, > >> RawPostingList p2). A Payload is added to the document representing a > UID. > >> Only 1-2 out of 1 million documents indexed generates this error. > >> > >> java.lang.AssertionError > >> problem adding > >> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia, > >> Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian > >> Embassy in Washington''' is the [[embassy]] of [[Croatia]] in > [[Washington, > >> D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue > >> (Washington, DC)|Massachusetts Avenue]], [[Washington DC > >> (northwest)|Northwest]] near [[Dupont Circle]]. Previously the building > had > >> been home to the [[Austrian Embassy in Washington|Austrian embassy]], > but > >> they left for larger quarters and sold the structure to Croatia in 1993. > >> The purchase and renovation of the building was largely paid for by the > >> [[Croatian-American]] community. In front of the embassy is a large > >> sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]]. > >> ==External link== *[http://www.croatiaemb.org/ Official site] > >> [[Category:Embassies in Washington|Croatia]] [[Category:Foreign > relations of > >> Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of > Croatia > >> in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006 > >> 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107> > >> > indexed,tokenized<_ID:proj.zoie.api.zoieindexreader$singlepayloadtokenstr...@e7b3cf > > > >> indexed<id:667162>> ex: java.lang.AssertionError > >> at > >> > org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228) > >> at > >> > org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144) > >> at > >> > org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136) > >> at > >> > org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51) > >> at > >> > org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202) > >> at > >> > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132) > >> at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145) > >> at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74) > >> at > >> > org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75) > >> at > >> > org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60) > >> at > >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574) > >> at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533) > >> at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442) > >> at > >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922) > >> at > >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880) > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >