Hello Jose,
Thank you for your response, I took a closer look. Below are my responses:
> Seems that you want to force a max number of segments to 1,
// you're done adding documents to it):
//
writer.forceMerge(1);
writer.close();
Yes, the line of code is uncommented because we want to understand how
it work when index big data sets. Should this be a concern?
> On a previous thread someone answered that the number of segments will
> affect the Index Size, and is not related with Index Integrity (like size
> of index may vary according with number of segments).
okay, no idea what the above actually mean but I would guess perhaps
the code we added, cause this exception?
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]),
forceMerge);
if (forceMerge && writer.hasPendingMerges()) {
if (i % 1000 == 0 && i != 0) {
logger.trace("forcing merge now.");
try {
writer.forceMerge(50);
writer.commit();
} catch (OutOfMemoryError e) {
logger.error("out of memory
during merging ", e);
throw new
OutOfMemoryError(e.toString());
}
}
}
}
}
} else {
FileInputStream fis;
> Should be...
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46,
> analyzer);
yes, we were and still referencing lucene_46 in our analyzer.
/Jason
On Sat, Apr 5, 2014 at 9:01 PM, Jose Carlos Canova <
[email protected]> wrote:
> Seems that you want to force a max number of segments to 1,
> On a previous thread someone answered that the number of segments will
> affect the Index Size, and is not related with Index Integrity (like size
> of index may vary according with number of segments).
>
> on version 4.6 there is a small issue on sample that is
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40,
> analyzer);
>
>
> Should be...
>
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46,
> analyzer);
>
>
> With this probably the line related to the codec will change too.
>
>
>
> On Fri, Apr 4, 2014 at 3:52 AM, Jason Wee <[email protected]> wrote:
>
> > Hello again,
> >
> > A little background of our experiment. We are storing lucene (version
> > 4.6.0) on top of cassandra. We are using the demo IndexFiles.java from
> the
> > lucene with minor modification such that the directory used is reference
> to
> > the CassandraDirectory.
> >
> > With large dataset (that is, index more than 50000 of files), after index
> > is done, and set forceMerge(1) and get the following exception.
> >
> >
> > BufferedIndexInput readBytes [ERROR] bufferStart = '0' bufferPosition =
> > '1024' len = '9252' after = '10276'
> > BufferedIndexInput readBytes [ERROR] length = '8192'
> > caught a class java.io.IOException
> > with message: background merge hit exception: _1(4.6):c10250
> > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5
> > [maxNumSegments=1]
> > java.io.IOException: background merge hit exception: _1(4.6):c10250
> > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5
> > [maxNumSegments=1]
> > at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1755)
> > at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1691)
> > at org.apache.lucene.store.IndexFiles.main(IndexFiles.java:159)
> > Caused by: java.io.IOException: read past EOF:
> > CassandraSimpleFSIndexInput(_1.nvd in path="_1.cfs"
> slice=5557885:5566077)
> > at
> >
> >
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:186)
> > at
> >
> >
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:125)
> > at
> >
> >
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:230)
> > at
> >
> >
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:186)
> > at
> >
> >
> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:159)
> > at
> >
> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:516)
> > at
> > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:232)
> > at
> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:127)
> > at
> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4057)
> > at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3654)
> > at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> > at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> >
> >
> > We do not know what is wrong as our understanding on lucene is limited.
> Can
> > someone give explanation on what is happening, or which might be the
> > possible error source is?
> >
> > Thank you and any advice is appreciated.
> >
> > /Jason
> >
>