Hi, 

Need help in understanding how rebuild_index works with multiple workers.  
I am using django 1.8.4, Whoosh 2.7.4  and django-haystack  2.6.0 to build 
out a search functionality on a database of 5 million records. My 
environment is Ubuntu and MacOSx. 

When I use the multiple workers option, I am not getting the total 5M 
records in the index.  I have tested this with a smaller subset of 1200 
records and found that I can only get all 1200 records into the index when 
I have one worker. I have tried with several different batch sizes and 
different number of workers and it is always the case where only a subset 
of records get indexed. 

Is this a known problem? I saw some issues reported on this topic in the 
Github repository but not sure if they have been resolved or not. When I 
run with multiple workers, the logs look fine and there are no errors 
around files getting locked or file not accessible which is something I 
would expect if multiple workers are trying to write into the file. I have 
allocated 150GB of space to the volume where indexed data is being stored 
and my server has 64 GB memory. So I am sure that this not due to lack of 
storage or lack of memory. 

I would really like to use the multiple   workers option to cut down the 
indexing time to a few hours instead of 12-14 hours. 

Thank you, 
Purbasha 




-- 
You received this message because you are subscribed to the Google Groups 
"django-haystack" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-haystack+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to