I'm rewriting most of the indexing code and managed to include reindexing for individual indexes and not all at once. Commit here: https://gitlab.com/mayan-edms/mayan-edms/commit/ac6f748113932d91f23f15dffd9a2ba95b2a1b66 The rewrite allows the use of less lock (just 2 now) so it is already much faster. This rewrite also open the possibility of indexing by workflow states and tags. The code is in a separate branch of the master branch (2.2) to try and push this to a next stable release (2.2.1 or 2.3) instead of waiting for the next major version (3.0). If you have a development install of Mayan please help test this branch to make its inclusion faster.
On Saturday, May 27, 2017 at 2:07:31 PM UTC-4, Roberto Rosario wrote: > > Doing some tests I've hit several regressions and a few race conditions > (without the 'document_indexing_task_do_rebuild_all_indexes' lock, deleting > a document would delete it's index instance if it is empty even while an > index is being rebuilt). > The entire indexing locking workflow will need to be remade too. This > refactor is bigger than initially expected. > > On Saturday, May 27, 2017 at 11:01:56 AM UTC-4, Roberto Rosario wrote: >> >> That's great! Going through your changes to see how much I can move >> upstream. >> >> On Friday, April 28, 2017 at 5:26:03 PM UTC-4, MacRobb Simpson wrote: >>> >>> I'm currently in the process of implementing Mayan to replace our >>> current Document Management System(FileBound). >>> Our setup currently consists 14 document types, 64 metadata types, 16 >>> indexes and over 66,000 files currently loaded. >>> >>> Reindexing this system is... somewhat slow to say the least. >>> I let it crunch away for a good 16 hours, and got about halfway through. >>> >>> Obviously, this isn't good enough - Indexing might be slow, but it >>> shouldn't be /this/ slow. >>> >>> With a few mods, I've sped this up by at least 8x(figure around 4 hours >>> for a full rebuild... Acceptable). >>> What I did was: >>> 1. Instead of indexing by document, then index, I'm indexing by index, >>> then document. This allows for a single index to be rebuilt at a time, vs >>> multiple being 'filled in' at once. >>> 2. Modify the delete section to only delete the current index as it's >>> being worked on. This allows you to keep using the other indexes during the >>> rebuild process. >>> 3. removed the 'with transaction.atomic():' line in the indexer. I'm >>> sure this makes it 'less safe' if something were to fail, but I figure that >>> if something fails a reindex is needed anyway. >>> (By splitting the index rebuild from the single-file-indexer, I can >>> leave that atomic transaction line for a single file, where it makes >>> sense). This change easily doubled the speed, if not quadrupled it. >>> >>> My final code: >>> mayan/apps/document_indexing/managers.py: >>> >>>> def rebuild_all_indexes(self): >>>> from .models import Index >>>> >>>> for index in Index.objects.filter(enabled=True): >>>> print 'indexing',index >>>> #Delete nodes applicable to index >>>> print 'deleting nodes' >>>> for instance_node in self.filter(id=index.id): >>>> instance_node.delete() >>>> #Delete empty nodes >>>> self.delete_empty_index_nodes() >>>> print 'adding index node' >>>> #Add index node >>>> root_instance, created = self.get_or_create( >>>> index_template_node=index.template_root, parent=None >>>> ) >>>> print 'indexing documents...' >>>> docsIndexed = 0 >>>> #Reindex each document >>>> for document in >>>> Document.objects.filter(document_type=index.document_types.all()): >>>> >>>> #Add index nodes? >>>> for template_node in index.template_root.get_children(): >>>> self.cascade_eval(document, template_node, >>>> root_instance) >>>> docsIndexed += 1 >>>> if docsIndexed % 10 == 0: >>>> print 'indexing >>>> document',document,docsIndexed,'completed' >>>> >>> All of the 'print' lines could be removed, but are very handy when >>> watching it run from run-server/devel mode. >>> >>> >>> Anyone got any other improvement ideas or potential pitfalls that this >>> could cause? >>> >> -- --- You received this message because you are subscribed to the Google Groups "Mayan EDMS" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
