*Summary:*

Lucene indexes appear to revert to some past state after an application
restart.

*Background:*

We're running an enterprise application written in Java/Spring/Hibernate,
deployed within Jetty, with a Postgres backend. See below for version info.

We use Lucene to index certain components of the database to enable
fast/complex searching.

The indexes are built by querying the relevant database tables,
transferring the data to Lucene documents and writing to disk.

An IndexWriter is used to add and commit the documents. A commit is
performed at the end of a batch of database reads (generally 5,000). The
reading and writing of batches is multi-threaded.

The writer is configured with the following TieredMergePolicy attributes:

segmentsPerTier=50.0
maxMergeAtOnce=5
maxMergedSegmentMB=100.0


No merge scheduler is set. The writer has its RAMBufferSizeMB set to 48.

There are 23 separate indexes used to represent different logical
components of the database.

The largest index on disk is 13.7G.

The largest index by number of documents contains around 32 million
documents.

Once the indexes are built they are maintained dynamically by the
application to reflect the current state of the database. Dynamic updates
are performed by a TrackingIndexWriter.

*Problem:*

After a reindex is run (as described above, a destructive process) the
application runs okay and all Lucene queries return expected values that
reflect the current state of the database.

Subsequent usage of the system maintains the indexes in the correct state
as evidenced by search results.

In the last month we have found that after a restart of the application the
indexes appear to revert to some unknown past state. The indexes can be
queried okay (they're not corrupt, there are no logged errors or stack
traces) but the data is either out of date (reflecting a past state of the
database entries they represent) or missing.

We first assumed the "past state" was based on the last reindex time, but
have subsequently found that restarting the application immediately
following a reindex still puts the indexes in a state that pre-dates the
time of the last reindex.

This is only occurring on a single site (our largest production site), and
has only started in recent months. We have yet to reproduce the problem
using an identical process with an identical configuration on
near-identical data.

We are not sure if the problem effects all of the indexes but know the
larger (and most important) indexes are effected.

*Question:*

We are inclined to think that the problem is somewhere in our code, but are
wondering if any of the described symptoms have been seen before by the
Lucene community. Suggestions on how to isolate the problem, or
configuration changes that may help are also most welcome.

*Version Info:*

Lucene:

lucene-analyzers-common-4.9.1.jar
lucene-core-4.9.1.jar
lucene-grouping-4.9.1.jar
lucene-join-4.9.1.jar
lucene-misc-4.9.1.jar
lucene-queries-4.9.1.jar
lucene-queryparser-4.9.1.jar
lucene-sandbox-4.9.1.jar
lucene-snowball-2.4.1.jar
lucene-suggest-4.9.1.jar

Postgres:

server: PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
client access: postgresql-9.1-901.jdbc4.jar

OS:

LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Red Hat Enterprise Linux Server release 6.5 (Santiago)

Java:

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

Jetty:

jetty-6.1.22.jar

Hibernate:

hibernate-commons-annotations-4.0.2.Final.jar
hibernate-core-4.2.2.Final.jar
hibernate-ehcache-4.2.2.Final.jar
hibernate-jpa-2.0-api-1.0.1.Final.jar

Spring:

spring-aop-4.0.4.RELEASE.jar
spring-aspects-4.0.4.RELEASE.jar
spring-beans-4.0.4.RELEASE.jar
spring-context-4.0.4.RELEASE.jar
spring-context-support-4.0.4.RELEASE.jar
spring-core-4.0.4.RELEASE.jar
spring-expression-4.0.4.RELEASE.jar
spring-instrument-4.0.4.RELEASE.jar
spring-jdbc-4.0.4.RELEASE.jar
spring-jms-4.0.4.RELEASE.jar
spring-orm-4.0.4.RELEASE.jar

Reply via email to