Hi all,

OK, I really should have titled the post, "CheckIndex limit with large tvd
files?"

I started a new CheckIndex run about 1:00 pm on Tuesday and it seems to be
stuck again looking at termvectors.
I gave CheckIndex 32GB of memory, turned on GC logging, and echoed STDERR
and STDOUT to a file

It's seems stuck while testing term vectors, but maybe it just takes
several days to test a term vector file that is 343GB.

Yes, I know I said we had term vectors turned off.  I forgot that we were
using a slightly modified version of the schema we use when we index
individual books on a page level.  We are using the fast-vector
highlighter, so we have termvectors turned on:

 <fieldType name="FullText" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="false" stored="true"
termVectors="true" termPositions="true" termOffsets="true"
omitNorms="false">

I've appended a listing of the top memory users from pmap below.

Looks like the *tvd file is using about 300GB of virtual memory, followed
by the *doc,*fdt and *pos files.

Since we have never run CheckIndex on large indexes with term vectors
before, we have no idea how long we should expect it to take.

Our normal page-level book indexes generally hold about 1,000 books (about
300,000 documents/pages)  and are 10-15GB total, with the tvf files
totalling about  700 MB and the *tvd files totaling a few hundred K.


Tom

----
The top 10 processes in pmap are:

 total        804,745,732K
00002baaf526c000 300,897,888K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch.tvd
00002b3b4bf1b000 155,250,472K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch_Lucene41_0.doc
00002b88aa709,000 143,788,268K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch.fdt
00002b604fae5,000 139,820,064K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch_Lucene41_0.pos
00002b32e6c10,000 33,554,476K rw---    [ anon ]
00002b81a59ed000 29,196,076K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch_Lucene41_0.tim
00002b3aee31b000 1,315,184K rw---    [ anon ]
00002b889b9b8,000 243,012K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch.nvd
00002b3ae6c39,000 109,276K rw---    [ anon ]
00002bf2b2,804,000  99,272K r--s-
 /htsolr/lss-dev/solrs/4.2/3/core/data/index/_bch.tvx




>
> On Tue, Jul 30, 2013 at 1:06 PM, Tom Burton-West <tburt...@umich.edu>
> wrote:
> > Thanks Mike, Robert and Adrien,
> >
> > Unfortunately, I killed the processes, so its too late to get a stack
> > trace.  On thing that was suspicious was that top was reporting memory
> use
> > as 20GB res even though I invoked the JVM with java -Xmx10g -Xms10g.
> >
> > I'm going to double the memory, turn on GC logging, and remember to echo
> > STDERR to a log and run it again on one of the indexes.
> > I'll report back as soon as something interesting shows up.  (Probably
> > tomorrow sometime.)
> >
> > Tom
> >
> >
> > On Tue, Jul 30, 2013 at 11:22 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Can you get a strack trace so we can see where the thread is stuck?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to