Revision: 19942
          http://sourceforge.net/p/gate/code/19942
Author:   ian_roberts
Date:     2017-01-10 15:51:25 +0000 (Tue, 10 Jan 2017)
Log Message:
-----------
Documentation for the index repair tool

Modified Paths:
--------------
    mimir/trunk/doc/admin.tex
    mimir/trunk/doc/mimir-guide.pdf
    mimir/trunk/doc/mimir-guide.tex

Added Paths:
-----------
    mimir/trunk/doc/tools.tex

Modified: mimir/trunk/doc/admin.tex
===================================================================
--- mimir/trunk/doc/admin.tex   2017-01-10 14:51:51 UTC (rev 19941)
+++ mimir/trunk/doc/admin.tex   2017-01-10 15:51:25 UTC (rev 19942)
@@ -311,7 +311,9 @@
 different state, such as {\em closing} while the index is being shut down,
 typically because the \Mimir{} server is itself being shut down. Sometimes a
 local index is {\em failed}, indicating a problem with the index.  Typically a
-failed index will need to be deleted by the administrator.
+failed index will need to be deleted by the administrator, though it may be
+possible to recover most of the data using the index repair tool (see
+section~\ref{sec:tools:repair}).
 
 Remote indexes inherit their state from the remote server, and federated 
indexes
 inherit their state by combining the states of their component indexes.

Modified: mimir/trunk/doc/mimir-guide.pdf
===================================================================
(Binary files differ)

Modified: mimir/trunk/doc/mimir-guide.tex
===================================================================
--- mimir/trunk/doc/mimir-guide.tex     2017-01-10 14:51:51 UTC (rev 19941)
+++ mimir/trunk/doc/mimir-guide.tex     2017-01-10 15:51:25 UTC (rev 19942)
@@ -89,6 +89,8 @@
 \input{plugins}
 \chapter{Extending and Customising \Mimir}\label{sec:extend}
 \input{extending}
+\chapter{Additional Tools}\label{sec:tools}
+\input{tools}
 
 \appendix
 \chapter{Change Log}\label{sec:changes}

Added: mimir/trunk/doc/tools.tex
===================================================================
--- mimir/trunk/doc/tools.tex                           (rev 0)
+++ mimir/trunk/doc/tools.tex   2017-01-10 15:51:25 UTC (rev 19942)
@@ -0,0 +1,80 @@
+This chapter documents additional tools that are provided with \Mimir\ for
+special use cases, but which are not required for general day-to-day operation.
+
+\section{Recovering a failed index}\label{sec:tools:repair}
+
+\Mimir\ indexes are ordinarily quite robust, but there are certain
+circumstances in which an index can become corrupted and marked as ``failed''
+in the \Mimir\ web UI.  Typically this only happens when the index has not been
+shut down cleanly, for example if an out-of-memory condition occurs during
+indexing.  The majority of these failures fall into two categories, either a
+crash during a ``sync to disk'' operation which leaves a corrupted batch on
+disk, or a crash after all batches have been saved but before the index has
+been completely closed, which leaves the document metadata zip files corrupted.
+In both of these cases the vast majority of the indexed data can usually be
+recovered using the index repair tool.  The last documents to be indexed will
+likely be lost -- exactly how many are lost depends on a number of factors
+including exactly when the failure occurred, the length of the
+\verb!timeBetweenBatches! setting in the index template, etc. -- but the tool
+attempts to minimise the number of lost documents as far as possible.
+
+The repair tool is a command line application which operates directly on the
+index files on disk.  In order to run the tool the index must not be ``open''
+and in use by a running \Mimir{}, so you must either shut down your running
+\Mimir\ application or delete the local index from the web UI (\emph{without}
+deleting the underlying index files from disk!).
+
+Before attempting the repair process it is {\bf very strongly recommended} to
+make a backup copy of the index files.  If the repair process itself fails
+(e.g. with an out of memory error) it can leave the index in a completely
+unrecoverable state, and you will have to restore from your backup, correct the
+problem (e.g. allocate more memory) and try again.
+
+The simplest way to run the repair tool is via the \verb!truncate-index.sh!
+bash script at \verb!WEB-INF/utils! inside the compiled \verb!mimir-cloud!  WAR
+file.
+
+\begin{verbatim}
+bash truncate-index.sh /path/to/the/index-12345.mimir
+\end{verbatim}
+
+The final parameter is the full path to the top-level directory of the index
+you want to repair.  If the repair is successful you should then be able to
+re-start your \Mimir\ application and/or re-import the fixed index using the
+``import an existing index'' option.
+
+See the comment block at the top of the script for full details of the
+available parameters.
+
+\subsection*{The recovery process in detail}
+
+The repair process consists of a number of phases.
+
+\begin{enumerate}
+\item Ensure the document metadata zip files are all complete, repairing the
+  last one if necessary
+\item Examine all the index batches and determine the latest point at which all
+  the sub-indexes successfully dumped a batch in sync.  This is referred to as
+  the ``last good batch''.  Delete any batches beyond this point.
+\item If the (repaired) zip files contain at least as many document as the good
+  batches, then simply truncate the zip collection to match the last good batch
+  and the repair process is complete.
+\item Otherwise, the zip files are the limiting factor, as the zip collection
+  ends in the middle of a ``good'' batch.  Determine which batch this is,
+  delete all the subsequent batches, then truncate what is now the last batch
+  to match the length of the zip collection.
+\end{enumerate}
+
+The final step can require a lot of memory if the last batch is large (e.g. a
+recently compacted \verb!head! batch), it may be necessary to allocate more
+memory to the repair process by editing the shell script.
+
+In the best case (all batches successfully synced to disk, just the zip
+collection failed to close) this will recover all but the last one or two
+documents.  The worst case is when the index failed during the very first sync
+to disk, in which case nothing will be recoverable, but in this case there
+should be no more than one hour or so of documents that need to be re-indexed.
+Most cases will fall somewhere between these extremes, and the number of
+documents lost depends on the \verb!timeBetweenBatches! configured in the index
+template.  A shorter time between batches means less potential for data loss
+but more work for the indexer.

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
GATE-cvs mailing list
GATE-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to