Just to bring closure on this issue... after numerous iterations with
Marcelo, adding diagnostics to Lucene to isolate the cause of this
exception, it turns out that it's a bug in Oracle 11g's JRE.

Marcelo worked it down to a single document, which when added to a new
index, would hit the exception.  Except, the first time he created
this single doc index it would run fine.  Only the 2nd time he created
it would it hit the exception.

A standalone test (static void main(..)) indexing that one doc runs
fine as well, on multiple OS's / JRE versions.

The bug seems to be related to the JIT compiler; specifically it seems
to cause quickSort to only partially run such that the array of terms
that are about to be flushed are not properly sorted and is left with
duplicate entries.  Very weird.

Marcelo is trying to find a workaround, eg maybe tweaking the JIT
settings, to prevent this from happening.  He's also trying to modify
the standalone test to get it to fail inside Oracle so he can submit
an issue to Oracle.

This isn't the first JRE bug that's affecting Lucene.  Here's another:

  https://issues.apache.org/jira/browse/LUCENE-1282

I don't like JRE bugs!

Mike

Marcelo Ochoa wrote:
Hi Michael:
  First thanks a lot for your time.
  See comments below.
 Is there any way to capture & serialize the actual documents being
 added (this way I can "replay" those docs to reproduce it)?
  Documents are a column VARCHAR2 from all_source Oracle's System
view, in fact is a table as:
create table test_source_big as (select * from all_source);

 Are you using threads?  Is autoCommit true or false?
  Oracle JVM uses by default a single Thread model, except that Lucene
is starting a parallel Thread. InfoStream information shows only one
Thread.
  AutoCommit is false.
  I am creating LuceneWritter with this code:
        IndexWriter writer = null;
        Parameters parameters = dir.getParameters();
        int mergeFactor =
            Integer.parseInt(parameters.getParameter("MergeFactor",
                                                     "" +
LogMergePolicy.DEFAULT_MERGE_FACTOR));
        int maxBufferedDocs =
Integer.parseInt(parameters.getParameter ("MaxBufferedDocs",
                                                     "" +
IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
        int maxMergeDocs =
            Integer.parseInt(parameters.getParameter("MaxMergeDocs",
                                                     "" +
LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
        int maxBufferedDeleteTerms =
Integer.parseInt(parameters.getParameter ("MaxBufferedDeleteTerms",
                                                     "" +

IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
        Analyzer analyzer = getAnalyzer(parameters);
        boolean useCompountFileName =
"true".equalsIgnoreCase(parameters.getParameter ("UseCompoundFile",
                                                            "false"));
        boolean autoTuneMemory =
"true".equalsIgnoreCase(parameters.getParameter ("AutoTuneMemory",
                                                            "true"));
        writer =
new IndexWriter(dir, autoCommitEnable, analyzer, createEnable);
        if (autoTuneMemory) {
            long memLimit =
((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);
            logger.info(".getIndexWriterForDir - Memory limit for
indexing (Mb): "+memLimit);
            writer.setRAMBufferSizeMB(memLimit);
        } else
            writer.setMaxBufferedDocs(maxBufferedDocs);
        writer.setMaxMergeDocs(maxMergeDocs);
        writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);
        writer.setMergeFactor(mergeFactor);
        writer.setUseCompoundFile(useCompountFileName);
        if (logger.isLoggable(Level.FINE))
            writer.setInfoStream(System.out);
   The example pass these relevant parameters:
AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene.analysis. StopAnalyzer;MergeFactor:500
   So, because AutoTuneMemory is true, instead of setting
MaxBufferedDocs I am setting RAMBufferSizeMB(53) which is calculated
using Oracle SGA free memory.

 Are you using payloads?
  No.

Were there any previous exceptions in this IndexWriter before flushing
 this segment?  Could you post the full infoStream output?
  There is no provious exception. Attached a .trc file generated by
Oracle 11g, it have infoStream information plus logging informartion
from Oracle-Lucene data cartridge.

<snip>
 Could you apply the patch below & re-run?  It will likely produce
 insane amounts of output, but we only need the last section to see
which term is hitting the bug. If that term consistently hits the bug
 then we can focus on how/when it gets indexed...
  I'll patch my lucene-2.3.1 source and send again the .trc file.
  Also, I am comparing FSDirectory implementation (2.3.1) with my
OJVMDirectory implementation to see changes on how the API of
BufferedIndex[Input|Output].java is used, may be here is the problem.
  For example latest implementation wait an IOException when open an
IndexInput and a file doesn't exists, my code throw a RuntimeException
wich works with Lucene 2.2.x but doesn't work with 2.3.1, this was the
first change to get Lucene-Oracle integration working.
  Best regards. Marcelo.
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to