Revision: 18056
          http://sourceforge.net/p/gate/code/18056
Author:   ian_roberts
Date:     2014-06-10 15:42:30 +0000 (Tue, 10 Jun 2014)
Log Message:
-----------
Support in the index template DSL for overriding the delay between dumping 
batches and the number of batches before a compaction.

Modified Paths:
--------------
    mimir/trunk/doc/default-index-config.txt
    mimir/trunk/doc/indexing.tex
    mimir/trunk/doc/mimir-guide.pdf
    mimir/trunk/mimir-web/MimirWebGrailsPlugin.groovy.template
    
mimir/trunk/mimir-web/src/groovy/gate/mimir/util/GroovyIndexConfigParser.groovy

Modified: mimir/trunk/doc/default-index-config.txt
===================================================================
--- mimir/trunk/doc/default-index-config.txt    2014-06-10 13:46:41 UTC (rev 
18055)
+++ mimir/trunk/doc/default-index-config.txt    2014-06-10 15:42:30 UTC (rev 
18056)
@@ -27,4 +27,8 @@
 /@    27  @/}
 /@    28  @/documentRenderer = new OriginalMarkupMetadataHelper()
 /@    29  @/documentFeaturesHelper = new 
DocumentFeaturesMetadataHelper("date","source","id", "type")
-/@    30  @/documentMetadataHelpers = [documentRenderer, 
documentFeaturesHelper]
\ No newline at end of file
+/@    30  @/documentMetadataHelpers = [documentRenderer, 
documentFeaturesHelper]
+/@    31  @/
+/@    32  @/// miscellaneous options - these are the defaults
+/@    33  @///timeBetweenBatches = 1.hour
+/@    34  @///maximumBatches = 20

Modified: mimir/trunk/doc/indexing.tex
===================================================================
--- mimir/trunk/doc/indexing.tex        2014-06-10 13:46:41 UTC (rev 18055)
+++ mimir/trunk/doc/indexing.tex        2014-06-10 15:42:30 UTC (rev 18056)
@@ -171,7 +171,7 @@
 \subsection*{Document Rendering and Metadata}
 \defaultIndexTemplate[linerange={28-30},firstnumber=28]
 
-The final part of the template concerns how document-level metadata is indexed,
+The next part of the template concerns how document-level metadata is indexed,
 and how this can be combined with the document text to render the document
 content at search time, with matches of the query highlighted.  These tasks are
 performed by objects that implement the interfaces
@@ -203,6 +203,28 @@
 serialisable to be usable (i.e. they must implement the 
 \lstinline!java.io.Serializable!) interface. 
 
+\subsection*{Miscellaneous options}
+\defaultIndexTemplate[linerange={32-34},firstnumber=32]
+Finally, additional miscellaneous options can be specified at the end of the
+template.  The supported options are:
+\bde
+\item[timeBetweenBatches] the maximum amount of time that the indexer should
+  wait between writing batches to disk.  Since only batches that have been
+  dumped to disk are searchable, this specifies the maximum time a document
+  should be held in RAM after having been submitted for indexing but before it
+  becomes available to be searched.  The value can either be a plain number (of
+  milliseconds) or a Groovy \lstinline!TimeCategory! duration expression such
+  as \lstinline!10.minutes!.  If unspecified, the default is one hour (3600000
+  milliseconds).  Note that it is always possible to force the system to dump
+  the current batch to disk immediately via the index administration page.
+\item[maximumBatches] the maximum number of constituent batches before a
+  compaction operation is triggered.  The default is 20, and it should rarely
+  be necessary to modify this as index compaction is transparent -- the index
+  behaves exactly the same whether or not it has recently been
+  compacted\footnote{The main difference is that a compacted index requires
+  fewer open file handles to operate.}.
+\ede
+
 \subsection*{Direct Indexes}
 \label{sec:direct-indexes}
 Starting with version $5.0$, \Mimir{} can build direct indexes as well as
@@ -246,12 +268,12 @@
 
 \section{Adding Documents to an Index}\label{sec:indexing:add-docs}
 
-Once an index has been created in {\em indexing} mode, the next stage is to add
-documents to the index.  \Mimir\ provides an HTTP API for this which accepts
-documents for indexing via HTTP POST requests that include the document in Java
-serialised format.  The easiest way to make use of this API is via GCP (the
-GATE Cloud Paralleliser batch processing tool) using a
-\lstinline!MimirOutputHandler!.  This GCP output handler makes use of the
+Once an index has been created, the next stage is to add documents to the
+index.  \Mimir\ provides an HTTP API for this which accepts documents for
+indexing via HTTP POST requests that include the document in Java serialised
+format.  The easiest way to make use of this API is via GCP (the GATE Cloud
+Paralleliser batch processing tool) using a \lstinline!MimirOutputHandler!.
+This GCP output handler makes use of the
 \lstinline!gate.mimir.index.MimirConnector! (in the {\tt mimir-client} module)
 to actually make the remote call, and you can use the same API in your own
 code.  To add a GATE document to an open index simply call:

Modified: mimir/trunk/doc/mimir-guide.pdf
===================================================================
(Binary files differ)

Modified: mimir/trunk/mimir-web/MimirWebGrailsPlugin.groovy.template
===================================================================
--- mimir/trunk/mimir-web/MimirWebGrailsPlugin.groovy.template  2014-06-10 
13:46:41 UTC (rev 18055)
+++ mimir/trunk/mimir-web/MimirWebGrailsPlugin.groovy.template  2014-06-10 
15:42:30 UTC (rev 18056)
@@ -169,6 +169,10 @@
 }
 documentRenderer = new OriginalMarkupMetadataHelper()
 documentMetadataHelpers = [documentRenderer]
+
+// miscellaneous options - these are the defaults
+//timeBetweenBatches = 1.hour
+//maximumBatches = 20
 """
         IndexTemplate.withTransaction {
           def defaultTemplate = new IndexTemplate(

Modified: 
mimir/trunk/mimir-web/src/groovy/gate/mimir/util/GroovyIndexConfigParser.groovy
===================================================================
--- 
mimir/trunk/mimir-web/src/groovy/gate/mimir/util/GroovyIndexConfigParser.groovy 
    2014-06-10 13:46:41 UTC (rev 18055)
+++ 
mimir/trunk/mimir-web/src/groovy/gate/mimir/util/GroovyIndexConfigParser.groovy 
    2014-06-10 15:42:30 UTC (rev 18056)
@@ -18,6 +18,8 @@
 import gate.mimir.IndexConfig.SemanticIndexerConfig
 import gate.mimir.IndexConfig.TokenIndexerConfig
 
+import groovy.time.TimeCategory
+
 /**
  * Helper class for parsing Groovy index configuration scripts into IndexConfig
  * objects.
@@ -43,7 +45,7 @@
     mc.annotation = semanticAnnotationsHandler.&annotation
     script.metaClass = mc
     
-    script.run()
+    use(TimeCategory, script.&run)
 
     // process the tokenFeatures section
     def tokenFeaturesClosure = scriptBinding.tokenFeatures
@@ -66,6 +68,21 @@
         scriptBinding.documentMetadataHelpers as DocumentMetadataHelper[],
         scriptBinding.documentRenderer)
 
+    if(scriptBinding.hasVariable('timeBetweenBatches')) {
+      // if timeBetweenBatches is a Duration like 5.minutes then convert
+      // it back to milliseconds, otherwise assume it is just a number of
+      // milliseconds in the first place
+      if(scriptBinding.timeBetweenBatches.respondsTo("toMilliseconds")) {
+        indexConfig.timeBetweenBatches = 
(int)scriptBinding.timeBetweenBatches.toMilliseconds()
+      } else {
+        indexConfig.timeBetweenBatches = scriptBinding.timeBetweenBatches as 
int
+      }
+    }
+
+    if(scriptBinding.hasVariable('maximumBatches')) {
+      indexConfig.maximumBatches = scriptBinding.maximumBatches as int
+    }
+
     semanticAnnotationsHandler.clear()
     tokenFeaturesHandler.clear()
     // clean up the metaclass to prevent memory leaks

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to