The internet is not the bottleneck ;-). It's the intranet here. Index is 14GB. Besides, it looks like Yonik found the problem. Karl
-----Original Message----- From: ext Walter Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, October 28, 2010 11:00 AM To: dev@lucene.apache.org Subject: Re: ArrayIndexOutOfBounds exception using FieldCache How big is it? The Internet works pretty well for large files. You can send a USB drive by snail mail. wunder On Oct 28, 2010, at 6:11 AM, <karl.wri...@nokia.com> wrote: > Talked with IT here - they don't recommend external transfers of this size. > So I think we'd best try the "instrument and repeat" approach instead." > > Karl > > -----Original Message----- > From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] > Sent: Thursday, October 28, 2010 8:16 AM > To: dev@lucene.apache.org > Subject: RE: ArrayIndexOutOfBounds exception using FieldCache > > It's on an internal Nokia machine, unfortunately, so the only way I can > transfer it out is with my credentials, or by email, which is definitely not > going to work ;-). But if you can provide me with an account on a machine > I'd be transferring it to, I may be able to scp it from here. > > Karl > > > -----Original Message----- > From: ext Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Thursday, October 28, 2010 7:50 AM > To: dev@lucene.apache.org > Subject: Re: ArrayIndexOutOfBounds exception using FieldCache > > Fun fun :) > > Is there anyway I can rsync/scp/ftp a copy of this index over....? > > Failing that I can make some patches that we can iterate on... > > Mike > > On Thu, Oct 28, 2010 at 6:15 AM, <karl.wri...@nokia.com> wrote: >> Not good indeed. >> >> Synched to trunk, blew away old indexes, reindexed, same behavior. So I >> think we've got a problem, Houston. ;-) >> >> Karl >> >> -----Original Message----- >> From: ext Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Wednesday, October 27, 2010 11:08 AM >> To: dev@lucene.apache.org >> Subject: Re: ArrayIndexOutOfBounds exception using FieldCache >> >> Hmmm not good! >> >> It could be you are hitting >> https://issues.apache.org/jira/browse/LUCENE-2633? That was fixed on >> Sep 9, after your code. Maybe try syncing up? >> >> Mike >> >> On Wed, Oct 27, 2010 at 9:21 AM, <karl.wri...@nokia.com> wrote: >>> Hi Folks, >>> >>> I just tried to index a data set that was probably 2x as large as the >>> previous one I'd been using with the same code. The indexing completed >>> fine, although it was slower than I would have liked. ;-) But the following >>> problem occurs when I try to use FieldCache to look up an indexed and stored >>> value: >>> >>> java.lang.ArrayIndexOutOfBoundsException: -65406 >>> at >>> org.apache.lucene.util.PagedBytes$Reader.fillUsingLengthPrefix(PagedBytes.java:98) >>> at >>> org.apache.lucene.search.FieldCacheImpl$DocTermsImpl.getTerm(FieldCacheImpl.java:918) >>> at ... >>> >>> The code that does this has been working for quite some time and has been >>> unmodified: >>> >>> /** Find a string field value, given the lucene ID, field name, and >>> value. >>> */ >>> protected String getStringValue(int luceneID, String fieldName) >>> throws IOException >>> { >>> // Find the right reader >>> final int idx = readerIndex(luceneID, starts, readers.length); >>> final int docBase = starts[idx]; >>> final IndexReader reader = readers[idx]; >>> >>> BytesRef ref = >>> FieldCache.DEFAULT.getTerms(reader,fieldName).getTerm(luceneID-docBase,new >>> BytesRef()); >>> String rval = ref.utf8ToString(); >>> //System.out.println(" Reading luceneID "+Integer.toString(luceneID)+" >>> field "+fieldName+" with result '"+rval+"'"); >>> return rval; >>> } >>> >>> } >>> >>> I added a try/catch to see what values were going into the key line: >>> >>> catch (RuntimeException e) >>> { >>> System.out.println("LuceneID = "+luceneID+", >>> fieldName='"+fieldName+"', idx="+idx+", docBase="+docBase); >>> System.out.println("Readers = "+readers.length); >>> int i = 0; >>> while (i < readers.length) >>> { >>> System.out.println(" Reader start "+i+" is "+starts[i]); >>> i++; >>> } >>> throw e; >>> } >>> >>> The resulting output was: >>> >>> LuceneID = 34466856, fieldName='id', idx=0, docBase=0 >>> Readers = 1 >>> Reader start 0 is 0 >>> >>> . which looks reasonable on the face of things. This is a version of trunk >>> from approximately 8/12/2010, so it is fairly old. Was there a fix for a >>> problem that could account for this behavior? Should I simply synch up? Or >>> am I doing something wrong here? The schema for the id field is: >>> >>> <fieldType name="string_idx" class="solr.StrField" sortMissingLast="true" >>> indexed="true" stored="true"/> >>> <field name="id" type="string_idx" required="true"/> >>> >>> Karl >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Walter Underwood Venture ASM, Troop 14, Palo Alto --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org