Revision: 17947
http://sourceforge.net/p/gate/code/17947
Author: ian_roberts
Date: 2014-05-10 22:35:37 +0000 (Sat, 10 May 2014)
Log Message:
-----------
Merged ORG changes from trunk
Modified Paths:
--------------
userguide/branches/release-8.0/gazetteers.tex
Property Changed:
----------------
userguide/branches/release-8.0/gazetteers.tex
Modified: userguide/branches/release-8.0/gazetteers.tex
===================================================================
--- userguide/branches/release-8.0/gazetteers.tex 2014-05-10 22:34:46 UTC
(rev 17946)
+++ userguide/branches/release-8.0/gazetteers.tex 2014-05-10 22:35:37 UTC
(rev 17947)
@@ -603,9 +603,19 @@
To initialise the gazetteer there are few mandatory parameters:
\begin{itemize}
\item \emph{Ontology} to be processed;
-\item \emph{Tokeniser}, \emph{POS Tagger} and \emph{GATE Morphological
Analyser}
-to be used during processing (if these are also used in a pipeline, their input
-and output parameters must remain set to the default annotation set);
+\item \emph{CorpusController} to process the ontology terms\footnote{In
+ previous versions of GATE the gazetteer took three separate parameters for
+ the tokeniser, POS tagger and morphological analyser. Existing saved
+ applications that use these parameters will still work in GATE 8.0.}. This
+ application will be run on a document that contains a single \verb!Sentence!
+ annotation spanning the whole document, and is expected to produce
+ annotations of type \verb!Token! in the default annotation set, with features
+ \verb!category! (the POS tag) and \verb!root! (the morphological root).
+ Typically this pipeline would contain a tokeniser appropriate to the source
+ language, a POS tagger, and a GATE Morphological Analyser PR, but any
+ application that will produce the right annotation types and features will
+ work. For example, when processing non-English text you may need to use an
+ alternative POS tagger such as the Stanford tagger or TreeTagger.
\end{itemize}
and few optional ones:
@@ -643,17 +653,13 @@
The OntoRoot Gazetteer's initialization preprocesses strings from the ontology
-and runs the tokenizer, POS tagger, and morphological analyser over them.
These
-PRs must remain set to use the default annotation set for input and output, or
-the OntoRoot Gazetteer will throw a ResourceInstantiationException. If you
-change the parameters of these PRs in a pipeline, you will not be able to
create
-OntoRoot Gazetteers with them afterwards; in this case, you should create
-separate instances of the three PRs and use them only for instantiating
OntoRoot
-Gazetteers without adding them to a pipeline. (As long as the PRs are not used
-in a pipeline, the runtime parameters for input and output remain set for the
-default annotation set, even though you cannot see or set them in the GUI.) It
-may be helpful to give the special PRs different names from the defaults so you
-can clearly distinguish them from the ones used in the pipeline.
+and runs the root finder application over them. It is possible to re-use the
+same tokeniser, POS tagger and morphological analyser PR instances in both the
+root finder application and the main pipeline that will contain the finished
+OntoRoot Gazetteer, but in this case the PRs \emph{must} use the default
+annotation set for output. If you need to use a different annotation set for
+your main pipeline's output then you will need to create separate PRs
+specifically for the root finder and configure those to use the default set.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect[sec:gazetteers:ontoRootGaz:steps]{Simple steps to run OntoRoot
@@ -725,13 +731,15 @@
\item RegEx Sentence Splitter (or ANNIE Sentence Splitter)
\end{itemize}
+Place the tokeniser, POS tagger and morphological analyser PRs into a new
+``corpus pipeline'' application, named ``Root finder''.
+
\item Create an \emph{Onto Root Gazetteer} and set the init parameters.
Mandatory ones are:
\begin{itemize}
- \item \emph{Ontology}: select previously created myOntology;
- \item \emph{Tokeniser}: select previously created Tokeniser;
- \item \emph{POS Tagger}: select previously created POS Tagger;
- \item \emph{Morpher}: select previously created Morpher.
+ \item \emph{ontology}: select previously created myOntology;
+ \item \emph{rootFinderApplication}: select the ``Root finder'' pipeline you
+ created above.
\end{itemize}
OntoRoot gazetteer is quite flexible in that it can be configured using the
optional parameters. List of all parameters is detailed in
@@ -748,9 +756,9 @@
and when prompt with a window, add 'Token.root' in the provided textbox, then
click Add button. Click OK, give name to the new PR (optional) and then click
OK.
-\item Create an application. Right click on Application, then New Pipeline (or
-Corpus Pipeline). Add the following PRs to the application in this particular
-order:
+\item Create an application. Right click on Application, then create a new
+Corpus Pipeline (or Conditional Corpus Pipeline). Add the following PRs to the
+application in this particular order:
\begin{itemize}
\item Document Reset PR
\item RegEx Sentence Splitter (or ANNIE Sentence Splitter)
@@ -760,6 +768,11 @@
\item Flexible Gazetteer
\end{itemize}
+The tokeniser, POS tagger and morphological analyser may be the same ones used
+in the root finder application, or they may be different (and must be different
+if you want to use an annotation set other than the default one for this
+pipeline's PRs).
+
\item Create a document to process with the new
application; for example, if the ontology was
\texttt{http://gate.ac.uk/ns/gate-kb}, then the document could be the GATE home
Property changes on: userguide/branches/release-8.0/gazetteers.tex
___________________________________________________________________
Modified: svn:mergeinfo
## -1 +1,2 ##
/userguide/branches/release-6.0/gazetteers.tex:13203-13218
+/userguide/trunk/gazetteers.tex:17945
\ No newline at end of property
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs