We've had somewhat of a similar situation ourselves, where we are indexing
about a million records to an index, and each record can be somewhat large.

Now..what happened on our side was that the index files (very similar in
structure to what you have below) came up to a 2 gig limit and stopped
there..and the indexer started crashing each time it hit this limit.

On your side, I don't see your index file sizes really that large.  I think
the compiling with large file support only really kicks in when you hit this
2 gig size limit.

Couple of thoughts that might help:
1.  On our side, to keep size down, I would optimize the index at every
100,000 documents.  The optimize call also flushes the index.

2.  Make sure you close the index once you index your data.  Small
thing..but just making sure.

3.  With the index being this large, we actually have two copies, one for
searching against an already optimized index, and the other copy doing the
indexing.  This way, no items are being searched on while the indexing is
taking place.

4.  One neat thing that I learned with indexing large items, was that I
don't have to actually store everything.  I can have a field set to
tokenize, but not store, so that it can be searched..but I don't need it to
be displayed in the search results per say..I don't actually store it, so I
was able to keep my index size down.



> From: "Ben Lee" <[EMAIL PROTECTED]>
> Reply-To: [email protected]
> Date: Tue, 10 Oct 2006 18:35:35 -0700
> To: [email protected]
> Subject: [Ferret-talk] Indexing problem 10.9/10.10
> 
> Sorry if this is a repost-  I wasn't sure if the www.ruby-forum.com
> list works for postings.
> I've been having trouble with indexing a large amount of documents(2.4M).
> 
> 
> Essentially, I have one process that is following the tutorial
> dumping documents to an index stored on the file system.  If I open the
> index with another process, and run the size() method it is stuck at
> a number of documents much smaller than the number I've added to the index.
> 
> Eg. 290k -- when the indexer process has already gone through 1 M.
> 
> Additionally, if I search, I don't get results past an
> even smaller number of docs (22k) . I've tried the two latest ferret releases.
> 
> 
> Does this listing of the index directory look right?
> 
> -rw-------  1 blee blee 3.8M Oct 10 17:06 _v.fdt
> -rw-------  1 blee blee  51K Oct 10 17:06 _v.fdx
> -rw-------  1 blee blee  12M Oct 10 16:49 _u.cfs
> -rw-------  1 blee blee   97 Oct 10 16:49 fields
> 
> -rw-------  1 blee blee   78 Oct 10 16:49 segments
> -rw-------  1 blee blee  11M Oct 10 16:23 _t.cfs
> -rw-------  1 blee blee  11M Oct 10 15:56 _s.cfs
> -rw-------  1 blee blee  15M Oct 10 15:11 _r.cfs
> -rw-------  1 blee blee  13M Oct 10 14:48 _q.cfs
> 
> -rw-------  1 blee blee  14M Oct 10 14:37 _p.cfs
> -rw-------  1 blee blee  13M Oct 10 14:28 _o.cfs
> -rw-------  1 blee blee  12M Oct 10 14:19 _n.cfs
> -rw-------  1 blee blee  12M Oct 10 14:16 _m.cfs
> -rw-------  1 blee blee 118M Oct 10 14:10 _l.cfs
> 
> -rw-------  1 blee blee 129M Oct 10 13:24 _a.cfs
> -rw-------  1 blee blee    0 Oct 10 13:00 ferret-write.lck
> 
> Thanks,
> Ben
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
> 

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to