Revision: 15903
http://gate.svn.sourceforge.net/gate/?rev=15903&view=rev
Author: adamfunk
Date: 2012-06-25 10:55:27 +0000 (Mon, 25 Jun 2012)
Log Message:
-----------
GenericTagger's new failOnMissingInputAnnotations parameter (like the
source code, the documentation is largely copied from another PR).
Modified Paths:
--------------
userguide/trunk/annie.tex
userguide/trunk/misc-creole.tex
Modified: userguide/trunk/annie.tex
===================================================================
--- userguide/trunk/annie.tex 2012-06-25 01:20:24 UTC (rev 15902)
+++ userguide/trunk/annie.tex 2012-06-25 10:55:27 UTC (rev 15903)
@@ -125,19 +125,23 @@
the RHS describes the annotations to be added to the AnnotationSet.
The LHS is separated from the RHS by `$>$'.
The following operators can be used on the LHS:
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
| (or)
* (0 or more occurrences)
? (0 or 1 occurrences)
+ (1 or more occurrences)
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\noindent
The RHS uses `;' as a separator, and has the following format:
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
{LHS} > {Annotation type};{attribute1}={value1};...;{attribute
n}={value n}
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\noindent
Details about the primitive constructs available are given in the
@@ -146,10 +150,12 @@
\noindent
The following tokeniser rule is for a word
beginning with a single capital letter:
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
`UPPERCASE_LETTER' `LOWERCASE_LETTER'* >
Token;orth=upperInitial;kind=word;
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\noindent
It states that the sequence must begin with an uppercase letter,
followed by zero or more lowercase letters. This sequence will then be
@@ -232,7 +238,8 @@
of cities, organisations, days of the week, etc.
Below is a small section of the list for units of currency:
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
Ecu
European Currency Units
FFr
@@ -243,7 +250,8 @@
New Taiwan dollars
NT dollar
NT dollars
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
An index file (lists.def) is used to access these lists; for each list, a
major type is specified and, optionally, a minor type. It is also
@@ -263,12 +271,14 @@
gazetteer list should reside in the same directory as the index
file.
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
currency_prefix.lst:currency_unit:pre_amount
currency_unit.lst:currency_unit:post_amount
date.lst:date:specific
day.lst:date:day
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
So, for example, if a specific day needs to be identified, the minor
type `day' should be specified in the grammar, in order to match
@@ -867,8 +877,8 @@
First of all, we give an example of a grammar rule (and corresponding
macros) for money, which would recognise this type of pattern.
-\begin{small}\begin{verbatim}
-
+\begin{small}
+\begin{verbatim}
Macro: MILLION_BILLION
({Token.string == "m"}|
{Token.string == "million"}|
@@ -894,7 +904,8 @@
)
:money -->
:money.Number = {kind = "money", rule = "Money1"}
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\subsect{Step 1 - Tokenisation}
@@ -906,7 +917,8 @@
separate token), and any number of consecutive spaces
and/or control characters are recognised as a single spacetoken.
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
Token, string = `800', kind = number, length = 3
Token, string = `,', kind = punctuation, length = 1
Token, string = `000', kind = number, length = 3
@@ -914,7 +926,8 @@
Token, string = `US', kind = word, length = 2, orth = allCaps
SpaceToken, string = ` ', kind = space, length = 1
Token, string = `dollars', kind = word, length = 7, orth = lowercase
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\subsect{Step 2 - List Lookup}
@@ -922,9 +935,11 @@
matching words in the text. It finds the following match for the
string `US dollars':
-\begin{small}\begin{verbatim}
+\begin{small}
+\begin{verbatim}
Lookup, minorType = post_amount, majorType = currency_unit
-\end{verbatim}\end{small}
+\end{verbatim}
+\end{small}
\subsect{Step 3 - Grammar Rules}
Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex 2012-06-25 01:20:24 UTC (rev 15902)
+++ userguide/trunk/misc-creole.tex 2012-06-25 10:55:27 UTC (rev 15903)
@@ -186,6 +186,11 @@
unmappable characters are replaced by question marks when the
document is passed to the tagger.
This is useful if your documents are largely OK but contain the odd
character from outside the
Latin-1 range.
+ \item \textbf{failOnMissingInputAnnotations}: if set to false, the PR
+ will not fail with an ExecutionException if no input Annotations are
+ found and instead only log a single warning message per session and a
+ debug message per document that has no input annotations (default =
+ true).
\item \textbf{inputTemplate}: template string describing how to build
the line of input for the tagger
corresponding to a single annotation. The template contains
placeholders of the form
\verb|${feature}| which will be replaced by the value of the
corresponding feature from the
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs