On 10/11/06, Ben Lee <[EMAIL PROTECTED]> wrote: > Sorry if this is a repost- I wasn't sure if the www.ruby-forum.com > list works for postings. > I've been having trouble with indexing a large amount of documents(2.4M). > > > Essentially, I have one process that is following the tutorial > dumping documents to an index stored on the file system. If I open the > index with another process, and run the size() method it is stuck at > a number of documents much smaller than the number I've added to the index. > > Eg. 290k -- when the indexer process has already gone through 1 M. > > Additionally, if I search, I don't get results past an > even smaller number of docs (22k) . I've tried the two latest ferret releases. > > > Does this listing of the index directory look right? > > -rw------- 1 blee blee 3.8M Oct 10 17:06 _v.fdt > -rw------- 1 blee blee 51K Oct 10 17:06 _v.fdx > -rw------- 1 blee blee 12M Oct 10 16:49 _u.cfs > -rw------- 1 blee blee 97 Oct 10 16:49 fields > > -rw------- 1 blee blee 78 Oct 10 16:49 segments > -rw------- 1 blee blee 11M Oct 10 16:23 _t.cfs > -rw------- 1 blee blee 11M Oct 10 15:56 _s.cfs > -rw------- 1 blee blee 15M Oct 10 15:11 _r.cfs > -rw------- 1 blee blee 13M Oct 10 14:48 _q.cfs > > -rw------- 1 blee blee 14M Oct 10 14:37 _p.cfs > -rw------- 1 blee blee 13M Oct 10 14:28 _o.cfs > -rw------- 1 blee blee 12M Oct 10 14:19 _n.cfs > -rw------- 1 blee blee 12M Oct 10 14:16 _m.cfs > -rw------- 1 blee blee 118M Oct 10 14:10 _l.cfs > > -rw------- 1 blee blee 129M Oct 10 13:24 _a.cfs > -rw------- 1 blee blee 0 Oct 10 13:00 ferret-write.lck > > Thanks, > Ben
I thought this was possibly due to the fact that you didn't have Ferret compiled with large-file support but by the looks of it you aren't getting near that limit yet. In the directory listing you have here there is no way you could have added more than 290K documents unless you set :max_buffered_docs to a different value (> 10,000). Perhaps the index is getting over-written at some stage. Could you show us the code you are using for indexing? As for search results only showing for the top 22k documents, I'm not sure what the problem might be. You need to make sure you open the index reader or searcher after committing the index writer, otherwise the latest results won't show up. I don't think this is your problem though as I'm sure you would have opened the index-reader much later than after indexing 22k documents. Cheers, Dave _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

