Revision: 13724
http://gate.svn.sourceforge.net/gate/?rev=13724&view=rev
Author: ian_roberts
Date: 2011-04-20 15:54:41 +0000 (Wed, 20 Apr 2011)
Log Message:
-----------
Changelog, and merging changes to mimir, cloud and teamware chapters from trunk.
Modified Paths:
--------------
userguide/branches/release-6.1/changes.tex
userguide/branches/release-6.1/cloud.tex
userguide/branches/release-6.1/intro.tex
userguide/branches/release-6.1/mimir.tex
userguide/branches/release-6.1/misc-creole.tex
userguide/branches/release-6.1/recent-changes.tex
userguide/branches/release-6.1/tao_main.tex
userguide/branches/release-6.1/teamware.tex
Property Changed:
----------------
userguide/branches/release-6.1/
userguide/branches/release-6.1/changes.tex
userguide/branches/release-6.1/evaluation.tex
userguide/branches/release-6.1/gazetteers.tex
userguide/branches/release-6.1/machine-learning.tex
userguide/branches/release-6.1/ontology_ocat_add-new.png
userguide/branches/release-6.1/ontology_ocat_edit.png
userguide/branches/release-6.1/ontology_ocat_options.png
userguide/branches/release-6.1/ontology_ocat_view.png
userguide/branches/release-6.1/recent-changes.tex
Property changes on: userguide/branches/release-6.1
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0:13203-13218
/userguide/trunk:10614-10900
+ /userguide/branches/release-6.0:13203-13218
/userguide/trunk:10614-10900,13717,13720-13723
Modified: userguide/branches/release-6.1/changes.tex
===================================================================
--- userguide/branches/release-6.1/changes.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/changes.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -19,7 +19,145 @@
%NEW CHANGES SHOULD BE ADDED IN `recent-changes.tex' NOT HERE!!!
\input{recent-changes}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\rcSect[6.0]{Version 6.0 (November
2010)}\ifnested\label{subsec:changes:6.0b1}\else\label{sec:changes:6.0b1}\fi
+\rcSubsect{Major new features}
+
+Added an annotation tool for the document editor: the Relation Annotation
+Tool (RAT). It is designed to annotate a document with ontology instances
+and to create relations between annotations with ontology object
+properties. It is close and compatible with the Ontology Annotation Tool
+(OAT) but focus on relations between annotations. See
+section~\ref{sec:ontologies:rat} for details.
+
+Added a new \emph{scriptable controller} to the Groovy plugin, whose execution
+strategy is controlled by a simple Groovy DSL. This supports more powerful
+conditional execution than is possible with the standard conditional
+controllers (for example, based on the presence or absence of a particular
+annotation, or a combination of several document feature values), rich flow
+control using Groovy loops, etc. See
+section~\ref{sec:api:groovy:controller} for details.
+
+A new version of Alignment Editor has been added to the GATE distribution. It
+consists of several new features such as the new alignment viewer, ability to
+create alignment tasks and store in xml files, three different views to align
+the text (links view and matrix view - suitable for character, word and phrase
+alignments, parallel view - suitable for sentence or long text alignment), an
+alignment exporter and many more. See chapter \ref{chap:alignment} for more
+information.
+
+MetaMap, from the National Library of Medicine (NLM), maps biomedical text to
+the \textbf{UMLS Metathesaurus} and allows Metathesaurus concepts to be
+discovered in a text corpus. The Tagger\_MetaMap plugin for GATE wraps the
+MetaMap Java API client to allow GATE to communicate with a remote (or local)
+MetaMap PrologBeans \textbf{mmserver} and MetaMap distribution. This allows
the
+content of specified annotations (or the entire document content) to be
+processed by MetaMap and the results converted to GATE annotations and
features.
+See section~\ref{sec:misc-creole:metamap} for details.
+
+A new plugin called Web\_Translate\_Google has been added with a PR called
+Google Translator PR in it. It allows users to translate text using the
+Google translation services. See section
\ref{sec:misc-creole:google-translate}
+for more information.
+
+New Gazetteer Editor for ANNIE Gazetteer that can be used instead of
+Gaze. It uses tables instead of text area to display the gazetteer
+definition and lists, allows sorting on any column, filtering of the lists,
+reloading a list, etc. See section~\ref{sec:gazetteers:anniegazeditor}.
+
+\rcSubsect{Breaking changes}
+
+This release contains a few small changes that are not backwards-compatible:
+\begin{itemize}
+\item Changed the semantics of the ontology-aware matching mode in JAPE to take
+account of the default namespace in an ontology. Now {\tt class} feature
+values that are not complete URIs will be treated as naming classes within the
+default namespace of the target ontology only, and not (as previously) any
+class whose URI ends with the specified name. This is more consistent with the
+way OWL normally works, as well as being much more efficient to execute. See
+section~\ref{sec:ontologies:ontology-aware-jape} for more details.
+
+\item Updated the WordNet plugin to support more recent releases of WordNet
+than 1.6. The format of the configuration file has changed, if you are using
+the previous WordNet 1.6 support you will need to update your configuration.
+See section~\ref{sec:misc-creole:wn} for details.
+
+\item The deprecated Tagger\_TreeTagger plugin has been removed, applications
+that used it will need to be updated to use the Tagger\_Framework plugin
+instead. See section~\ref{sec:parsers:taggerframework} for details of how to
+do this.
+\end{itemize}
+
+\rcSubsect{Other new features and bugfixes}
+
+The concept of {\it templates} has been introduced to JAPE. This is a way to
+declare named ``variables'' in a JAPE grammar that can contain placeholders
+that are filled in when the template is referenced. See
+section~\ref{sec:jape:templates} for full details.
+
+Added a JAPE operator to get the string covered by a left-hand-side label and
+assign it to a feature of a new annotation on the right hand side (see
+section~\ref{sec:jape:metaproperties}).
+
+Added a new API to the CREOLE registry to permit plugins that live
+entirely on the classpath. {\tt CreoleRegister.registerComponent} instructs
+the registry to scan a single java Class for annotations, adding it to the
+set of registered plugins. See section~\ref{sec:api:plugins} for details.
+
+Maven artifacts for GATE are now published to the central Maven
+repository. See section~\ref{sec:gettingstarted:maven} for details.
+
+Bugfix: {\tt DocumentImpl} no longer changes its {\tt stringContent} parameter
+value whenever the document's content changes. Among other things, this means
+that saved application states will no longer contain the full text of the
+documents in their corpus, and documents containing XML or HTML tags that were
+originally created from string content (rather than a URL) can now safely be
+stored in saved application states and the GATE Developer saved session.
+
+A processing resource called Quality Assurance PR has been added in the Tools
+plugin. The PR wraps the functionality of the Quality Assurance Tool
+(section \ref{sec:eval:corpusqualityassurance}).
+
+A new section for using the Corpus Quality Assurance from GATE Embedded has
+been written. See section~\ref{sec:eval:corpusqualityassurance}.
+
+The Generic Tagger PR (in the Tagger\_Framework plugin) now allows more
+flexible specification of the input to the tagger, and is no longer limited to
+passing just the ``string'' feature from the input annotations. See
+section~\ref{sec:parsers:taggerframework} for details.
+
+Added new parameters and options to the LingPipe Language Identifier PR.
+(section~\ref{sec:misc-creole:lingpipe:langid}), and corrected the
+documentation for the LingPipe POS Tagger
+(section~\ref{sec:misc-creole:lingpipe:postagger}).
+
+In the document editor, fixed several exceptions to make editing text with
+annotations highlighted working. So you should now be able to edit the text
+and the annotations should behave correctly that is to say move, expand or
+disappear according to the text insertions and deletions.
+
+Options for document editor: read-only and insert append/prepend have been
+moved from the options dialogue to the document editor toolbar at the top
+right on the triangle icon that display a menu with the options. See
+section~\ref{sec:developer:documents}.
+
+Added new parameters and options to the Crawl PR and document features to its
+output; see section~\ref{sec:misc-creole:crawler} for details.
+
+Fixed a bug where ontology-aware JAPE rules worked correctly when the target
+annotation's class was a subclass of the class specified in the rule, but
+failed when the two class names matched exactly.
+
+Improved support for conditional pipelines containing non-LanguageAnalyser
+processing resources.
+
+Added the current {\tt Corpus} to the script binding for the Groovy Script PR,
+allowing a Groovy script to access and set corpus-level features. Also added
+callbacks that a Groovy script can implement to do additional pre- or
+post-processing before the first and after the last document in a corpus. See
+section~\ref{sec:api:groovy} for details.
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:changes:5.2.1]{Version 5.2.1 (May 2010)}
Property changes on: userguide/branches/release-6.1/changes.tex
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-5.0/changes.tex:10496-10518
/userguide/branches/release-6.0/changes.tex:13203-13218
+ /userguide/branches/release-5.0/changes.tex:10496-10518
/userguide/branches/release-6.0/changes.tex:13203-13218
/userguide/trunk/changes.tex:13717,13720-13723
Modified: userguide/branches/release-6.1/cloud.tex
===================================================================
--- userguide/branches/release-6.1/cloud.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/cloud.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -326,10 +326,12 @@
GATECloud.net annotation jobs are executed on virtual 64-bit (x86\_64) Linux
servers in the cloud, specifically Ubuntu 10.10 (Maverick Meerkat). The GATE
-application is run using the open-source GCP tool on Sun Java 6 (1.6.0\_21).
-The current offering uses the Amazon EC2 cloud, and runs jobs on their
-'m1.xlarge' machines which provide 4 virtual CPU cores and 15GB of memory, of
-which 13GB is available to the GCP process.
+application is run using the open-source GCP tool\footnote{Source code is
+available in the subversion repository at\\
+{\tt https://gate.svn.sourceforge.net/svnroot/gate/gcp/trunk}} on Sun Java 6
+(1.6.0\_21). The current offering uses the Amazon EC2 cloud, and runs jobs on
+their 'm1.xlarge' machines which provide 4 virtual CPU cores and 15GB of
+memory, of which 13GB is available to the GCP process.
The GCP (GATE Cloud Paralleliser) process is configured for 'headless'
operation
(\verb^-Djava.awt.headless=true^), and your code should not assume that a GUI
Property changes on: userguide/branches/release-6.1/evaluation.tex
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/evaluation.tex:13203-13218
/userguide/trunk/gui.tex:10614-10900
+ /userguide/branches/release-6.0/evaluation.tex:13203-13218
/userguide/trunk/evaluation.tex:13717,13720-13723
/userguide/trunk/gui.tex:10614-10900
Property changes on: userguide/branches/release-6.1/gazetteers.tex
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/gazetteers.tex:13203-13218
+ /userguide/branches/release-6.0/gazetteers.tex:13203-13218
/userguide/trunk/gazetteers.tex:13717,13720-13723
Modified: userguide/branches/release-6.1/intro.tex
===================================================================
--- userguide/branches/release-6.1/intro.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/intro.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -132,18 +132,18 @@
\item
\textit{a cloud computing solution} for hosted large-scale text
processing, \textbf{GATE Cloud}
- (\htlinkplain{http://gatecloud.net/}). \ifprintedbook See also
Chapter~\ref{chap:cloud}.\fi
+ (\htlinkplain{http://gatecloud.net/}). See also Chapter~\ref{chap:cloud}.
\item
\textit{a web app}, \htlink{http://gate.ac.uk/teamware/}{\bf GATE Teamware}: a
collaborative
annotation environment for factory-style semantic annotation projects built
around a workflow engine and a heavily-optimised backend service
- infrastructure. \ifprintedbook See also Chapter~\ref{chap:teamware}.\fi
+ infrastructure. See also Chapter~\ref{chap:teamware}.
\item
\textit{a multi-paradigm search repository},
\htlink{http://gate.ac.uk/family/mimir.html}{\bf GATE \Mimir},
which can be used to index and search over text, annotations, semantic schemas
(ontologies), and semantic meta-data (instance data). It allows queries that
arbitrarily mix full-text, structural, linguistic and semantic queries and that
-can scale to terabytes of text. \ifprintedbook See also
Chapter~\ref{chap:mimir}.\fi
+can scale to terabytes of text. See also Chapter~\ref{chap:mimir}.
\item
\textit{a framework}, \htlink{http://gate.ac.uk/family/embedded.html}{\bf GATE
Embedded}: an object library
optimised for inclusion in diverse applications giving access to all the
@@ -166,7 +166,7 @@
experiments
\end{itemize}
-For more information on the GATE family see
\htlinkplain{http://gate.ac.uk/family/} \ifprintedbook and also Part IV of this
book\fi.
+For more information on the GATE family see
\htlinkplain{http://gate.ac.uk/family/} and also Part IV of this book.
One of our original motivations was to remove the necessity for
solving common engineering problems before doing useful research, or
Property changes on: userguide/branches/release-6.1/machine-learning.tex
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/machine-learning.tex:13203-13218
/userguide/trunk/learning-api.tex:10614-10900
+ /userguide/branches/release-6.0/machine-learning.tex:13203-13218
/userguide/trunk/learning-api.tex:10614-10900
/userguide/trunk/machine-learning.tex:13717,13720-13723
Modified: userguide/branches/release-6.1/mimir.tex
===================================================================
--- userguide/branches/release-6.1/mimir.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/mimir.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -16,6 +16,7 @@
arbitrarily mix full-text, structural, linguistic and semantic queries and that
can scale to terabytes of text.
+\ifprintedbook
This chapter provides an overview of \Mimir\ and its functionalities. For a
complete user guide, including code samples, please see
\htlinkplain{http://gate.ac.uk/family/mimir.html}.
@@ -509,3 +510,15 @@
\end{center}
\end{figure}
+\else % ifprintedbook
+
+Full details on how to build and use \Mimir\ can be found in its own
+\htlink{https://gate.svn.sourceforge.net/svnroot/gate/mimir/trunk/doc/mimir-guide.pdf}{user
guide}.
+GATE \Mimir\ is open-source software, released under the GNU Affero General
+Public Licence version 3. Commercial licences are available from the
+University of Sheffield. The source code is available from the subversion
+repository at
+
+{\tt https://gate.svn.sourceforge.net/svnroot/gate/mimir/trunk}
+
+\fi %end of ifprintedbook
Modified: userguide/branches/release-6.1/misc-creole.tex
===================================================================
--- userguide/branches/release-6.1/misc-creole.tex 2011-04-20 15:49:44 UTC
(rev 13723)
+++ userguide/branches/release-6.1/misc-creole.tex 2011-04-20 15:54:41 UTC
(rev 13724)
@@ -2645,9 +2645,14 @@
the \textbf{UMLS Metathesaurus} and allows Metathesaurus concepts to be
discovered in a text corpus.
-The Tagger\_MetaMap plugin for GATE wraps the MetaMap Java API client to allow
GATE to communicate with a remote (or local) MetaMap PrologBeans
\textbf{mmserver10} and MetaMap distribution. This allows the content of
specified annotations (or the entire document content) to be processed by
MetaMap and the results converted to GATE annotations and features.
+The Tagger\_MetaMap plugin for GATE wraps the MetaMap Java API client to allow
+GATE to communicate with a remote (or local) MetaMap PrologBeans
+\textbf{mmserver10} and MetaMap distribution. This allows the content of
+specified annotations (or the entire document content) to be processed by
+MetaMap and the results converted to GATE annotations and features.
-To use this plugin, you will need access to a remote MetaMap server, or
install one locally by downloading and installing the complete distribution:
+To use this plugin, you will need access to a remote MetaMap server, or install
+one locally by downloading and installing the complete distribution:
\url{http://metamap.nlm.nih.gov/}
@@ -2655,75 +2660,128 @@
\url{http://metamap.nlm.nih.gov/README_javaapi.html}
-The default \texttt{mmserver10} location and port locations are
\texttt{localhost} and \texttt{8066}. To use a different server location and/or
port, see the above API documentation and specify the
\texttt{--metamap\_server\_host} and \texttt{--metamap\_server\_port} options
within the \textbf{metaMapOptions} run-time parameter.
+The default \texttt{mmserver10} location and port locations are
+\texttt{localhost} and \texttt{8066}. To use a different server location and/or
+port, see the above API documentation and specify the
+\texttt{--metamap\_server\_host} and \texttt{--metamap\_server\_port} options
+within the \textbf{metaMapOptions} run-time parameter.
\subsect{Run-time parameters}
\begin{enumerate}
-\item{\textbf{annotateNegEx}}: set this to true to add NegEx features to
annotations
-(\texttt{NegExType} and \texttt{NegExTrigger}). See
+\item{\textbf{annotateNegEx}}: set this to true to add NegEx features to
+annotations (\texttt{NegExType} and \texttt{NegExTrigger}). See
\url{http://code.google.com/p/negex/} for more information on NegEx
-\item{\textbf{annotatePhrases}}: set to true to output MetaMap phrase-level
annotations
-(generally noun-phrase chunks). Only phrases containing a MetaMap mapping will
-be annotated. Can be useful for post-coordination of phrase-level terms that
do
-not exist in a pre-coordinated form in UMLS.
+\item{\textbf{annotatePhrases}}: set to true to output MetaMap phrase-level
+annotations (generally noun-phrase chunks). Only phrases containing a MetaMap
+mapping will be annotated. Can be useful for post-coordination of phrase-level
+terms that do not exist in a pre-coordinated form in UMLS.
-\item{\textbf{inputASName}}: input Annotation Set name. Use in conjunction
with
-\textbf{inputASTypes}: (see below). Unless specified, the entire document
content
-will be sent to MetaMap.
+\item{\textbf{inputASName}}: input Annotation Set name. Use in conjunction with
+\textbf{inputASTypes}: (see below). Unless specified, the entire document
+content will be sent to MetaMap.
-\item{\textbf{inputASTypes}}: only send the content of these annotations
within \textbf{inputASName} to
-MetaMap and add new MetaMap annotations inside each. Unless specified, the
entire document
-content will be sent to MetaMap.
+\item{\textbf{inputASTypes}}: only send the content of these annotations within
+\textbf{inputASName} to MetaMap and add new MetaMap annotations inside each.
+Unless specified, the entire document content will be sent to MetaMap.
-\item{\textbf{inputASTypeFeature}}: send the content of this feature within
\textbf{inputASTypes} to MetaMap and
-wrap a new MetaMap annotation around each annotation in inputASTypes.
-If the feature is empty or does not exist, then the annotation content is sent
instead.
+\item{\textbf{inputASTypeFeature}}: send the content of this feature within
+\textbf{inputASTypes} to MetaMap and wrap a new MetaMap annotation around each
+annotation in inputASTypes. If the feature is empty or does not exist, then
+the annotation content is sent instead.
-\item{\textbf{linebreakCount}}: specify the number of linebreak/whitespace
characters between paragraphs if processing a document with multiple paragraphs
separated by blank lines. The MetaMap API \texttt{processCitationsFromString()}
method used by this plugin chunks text separated by blank lines (i.e.
\verb+\n[\\s]*\n+), and the resulting output resets the start offset of each
chunk to 0, so the offset between each paragraph is effectively lost if this
allowance is not made.
+\item{\textbf{linebreakCount}}: specify the number of linebreak/whitespace
+characters between paragraphs if processing a document with multiple paragraphs
+separated by blank lines. The MetaMap API \texttt{processCitationsFromString()}
+method used by this plugin chunks text separated by blank lines (i.e.
+\verb+\n[\\s]*\n+), and the resulting output resets the start offset of each
+chunk to 0, so the offset between each paragraph is effectively lost if this
+allowance is not made.
-
-\item{\textbf{metaMapOptions}}: set parameter-less MetaMap options here.
Default is
-\texttt{-Xdt} (truncate Candidates mappings, disallow derivational variants
and do not use full text parsing).
-See \url{http://metamap.nlm.nih.gov/README_javaapi.html} for more details. NB:
-only set the \texttt{-y} parameter (word-sense disambiguation) if
+\item{\textbf{metaMapOptions}}: set parameter-less MetaMap options here.
+Default is \texttt{-Xdt} (truncate Candidates mappings, disallow derivational
+variants and do not use full text parsing). See
+\url{http://metamap.nlm.nih.gov/README_javaapi.html} for more details. NB: only
+set the \texttt{-y} parameter (word-sense disambiguation) if
\texttt{wsdserverctl} is running.
\item{\textbf{outputASName}}: output Annotation Set name.
-\item{\textbf{outputASType}}: output annotation name to be used for all
MetaMap annotations
+\item{\textbf{outputASType}}: output annotation name to be used for all MetaMap
+annotations
-\item{\textbf{outputMode}}: determines which mappings are output as
annotations in the
-GATE document, for each phrase:
+\item{\textbf{outputMode}}: determines which mappings are output as annotations
+in the GATE document, for each phrase:
\begin{itemize}
-\item \textbf{AllCandidatesAndMappings}: annotate both Candidate and final
mappings.
-This will usually result in multiple, overlapping annotations for each
term/phrase
-\item \textbf{AllMappings}: annotate all the final MetaMap Mappings for each
phrase. This will
-result in fewer annotations with higher precision (e.g. for 'lung cancer' only
-the complete phrase will be annotated as Neoplastic Process [\texttt{neop}])
-\item \textbf{HighestMappingOnly}: annotate only the highest scoring MetaMap
Mapping for each phrase.
-If two Mappings have the same score, the first returned by MetaMap is output.
+\item \textbf{AllCandidatesAndMappings}: annotate both Candidate and final
+mappings. This will usually result in multiple, overlapping annotations for
+each term/phrase
+\item \textbf{AllMappings}: annotate all the final MetaMap Mappings for each
+phrase. This will result in fewer annotations with higher precision (e.g. for
+``lung cancer'' only the complete phrase will be annotated as Neoplastic
+Process [\texttt{neop}])
+\item \textbf{HighestMappingOnly}: annotate only the highest scoring MetaMap
+Mapping for each phrase. If two Mappings have the same score, the first
+returned by MetaMap is output.
\item \textbf{AllCandidates}: annotate all Candidate mappings and not the final
Mappings. This will result in more annotations with less precision (e.g. for
-'lung cancer' both 'lung' (\texttt{bpoc}) and 'lung cancer' (\texttt{neop})
will
-be annotated).
+``lung cancer'' both ``lung'' (\texttt{bpoc}) and ``lung cancer''
+(\texttt{neop}) will be annotated).
\end{itemize}
-\item{\textbf{taggerMode}}: determines whether all term instances are
processed by MetaMap, the first instance only, or the first instance with
coreference annotations added. Only used if the inputASTypes parameter has been
set.
+\item{\textbf{taggerMode}}: determines whether all term instances are processed
+by MetaMap, the first instance only, or the first instance with coreference
+annotations added. Only used if the inputASTypes parameter has been set.
\begin{itemize}
-\item \textbf{FirstOccurrenceOnly}: only process and annotate the first
instance of each term in the document
-\item \textbf{CoReference}: process and annotate the first instance and
coreference following instances
-\item \textbf{AllOccurrences}: process and annotate all term instances
independently
+\item \textbf{FirstOccurrenceOnly}: only process and annotate the first
+instance of each term in the document
+\item \textbf{CoReference}: process and annotate the first instance and
+coreference following instances
+\item \textbf{AllOccurrences}: process and annotate all term instances
+independently
\end{itemize}
\end{enumerate}
+\subsect[sec:misc-creole:metamap:upgrade]{Upgrading from an earlier version of
+the plugin}
+The Tagger\_MetaMap plugin was completely rewritten for GATE 6.1, and the
+current version is not parameter-compatible with the version provided in GATE
+6.0 and earlier. The previous version of the plugin is still available under
+the Obsolete plugins directory, but we recommend you upgrade your application
+to use the new plugin version.
+
+Specifically, the following parameters have been removed: \verb+mmServerHost+,
+\verb+mmServerPort+, \verb+mmServerTimeout+, \verb+excludeSemanticTypes+,
+\verb+restrictSemanticTypes+, \verb+scoreThreshold+.
+
+These can now be specified using the \texttt{--metamap\_server\_host},
+\texttt{--metamap\_server\_port}, \texttt{--metamap\_server\_timeout},
+\texttt{-k}, \texttt{-J} and \texttt{-r} options in the
+\verb+metaMapOptions+ run-time parameter string.
+
+\begin{itemize}
+\item \verb+outputASType+ has been made a run-time parameter.
+\item \verb+useNegEx+ has been renamed to \verb+annotateNegEx+
+\end{itemize}
+
+The following changes have been made to \verb+outputMode+
+\begin{itemize}
+\item \verb+MappingsOnly+ has been renamed to \verb+AllMappings+
+\item \verb+CandidatesOnly+ has been renamed to \verb+AllCandidates+
+\item \verb+CandidatesAndMappings+ has been renamed to
+\verb+AllCandidatesAndMappings+
+\end{itemize}
+
+See the previous section for details of the additional parameters that have
+been added in the new version of the plugin.
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:misc-creole:boilerpipe]{Content Detection Using Boilerpipe}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Property changes on: userguide/branches/release-6.1/ontology_ocat_add-new.png
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/ontology_ocat_add-new.png:13203-13218
/userguide/trunk/add-new.png:10614-10900
+ /userguide/branches/release-6.0/ontology_ocat_add-new.png:13203-13218
/userguide/trunk/add-new.png:10614-10900
/userguide/trunk/ontology_ocat_add-new.png:13717,13720-13723
Property changes on: userguide/branches/release-6.1/ontology_ocat_edit.png
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/ontology_ocat_edit.png:13203-13218
/userguide/trunk/edit.png:10614-10900
+ /userguide/branches/release-6.0/ontology_ocat_edit.png:13203-13218
/userguide/trunk/edit.png:10614-10900
/userguide/trunk/ontology_ocat_edit.png:13717,13720-13723
Property changes on: userguide/branches/release-6.1/ontology_ocat_options.png
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/ontology_ocat_options.png:13203-13218
/userguide/trunk/options.png:10614-10900
+ /userguide/branches/release-6.0/ontology_ocat_options.png:13203-13218
/userguide/trunk/ontology_ocat_options.png:13717,13720-13723
/userguide/trunk/options.png:10614-10900
Property changes on: userguide/branches/release-6.1/ontology_ocat_view.png
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-6.0/ontology_ocat_view.png:13203-13218
/userguide/trunk/view.png:10614-10900
+ /userguide/branches/release-6.0/ontology_ocat_view.png:13203-13218
/userguide/trunk/ontology_ocat_view.png:13717,13720-13723
/userguide/trunk/view.png:10614-10900
Modified: userguide/branches/release-6.1/recent-changes.tex
===================================================================
--- userguide/branches/release-6.1/recent-changes.tex 2011-04-20 15:49:44 UTC
(rev 13723)
+++ userguide/branches/release-6.1/recent-changes.tex 2011-04-20 15:54:41 UTC
(rev 13724)
@@ -19,7 +19,57 @@
\def\rcSubsubsect#1{\subsubsect{#1}}
\fi
-\rcSectNoLabel{April 2011}
+\rcSect[6.1]{Version 6.1 (April 2011)}
+
+\rcSubsect{New CREOLE Plugins}
+
+\textbf{Tagger\_Numbers} to annotate many kinds of numbers in documents and
+determine their numeric values. The tagger can annotate numbers expressed in
+many forms including Arabic and Roman numerals, words (in English, French,
+German and Spanish) and scientific notation (4.3e6 = 4300000). See
+section~\ref{sec:misc-creole:numbers} for full details.
+
+\textbf{Tagger\_Measurements} to annotate many different forms of measurement
+expressions (``5.5 metres'', ``1 minute 30 seconds'', ``10 to 15 pounds'',
+etc.) along with their normalized values in SI units. See
+section~\ref{sec:misc-creole:measurements} for full details.
+
+\textbf{Tagger\_Boilerpipe}, which contains a
+boilerpipe\footnote{\htlinkplain{http://code.google.com/p/boilerpipe/}} based
+PR for performing content detection. See
+section~\ref{sec:misc-creole:boilerpipe} for full details.
+
+\textbf{Tagger\_DateNormalizer} to annotate and normalize dates within a
+document. See section~\ref{sec:misc-creole:datenormalizer} for full details.
+
+\textbf{Schema\_Tools} providing a ``Schema Enforcer'' PR that can be used to
+create a clean output annotation set based on a set of annotation schemas. See
+section~\ref{sec:misc-creole:schemaenforcer} for full details.
+
+\textbf{Teamware\_Tools} providing a new PR called QA Summariser for Teamware.
+When documents are annotated using GATE Teamware, this PR can be used for
+generating a summary of agreements among annotators. See
+section~\ref{sec:eval:qaForTW} for full details.
+
+\textbf{Tagger\_MetaMap} has been rewritten to make use of the new MetaMap Java
+API features. There are numerous performance enhancements and bug fixes
+detailed in section~\ref{sec:misc-creole:metamap}. Note that this version of
+the plugin is \emph{not} compatible with the version provided in GATE 6.0,
+though this earlier version is still available in the Obsolete directory if
+required.
+
+\rcSubsect{Other new features and improvements}
+
+Added support for handling controller events to JAPE by making it possible
+to define \texttt{ControllerStarted}, \texttt{ControllerFinished}, and
+\texttt{ControllerAborted} code blocks in a JAPE file (see
+section~\ref{sec:jape:javarhsoverview}).
+
+JAPE Java right-hand-side code can now access an \texttt{ActionContext} object
+through the predefined field \texttt{ctx} which allows access to the corpus LR
+and the transducer PR and their features (see
+section~\ref{sec:jape:javarhsoverview}).
+
Three new optional attributes can be specified in {\tt<GATECONFIG>}
element of {\tt gate.xml} or local configuration file:
@@ -29,85 +79,28 @@
\item \textbf{namespacePrefix} - The feature name to use that will hold the
namespace prefix of the element, e.g. "prefix"
\end{itemize}
-Setting these attributes will alter GATE's default namespace deserialization
-behaviour to remove the namespace prefix and add it as a feature, along with
the namespace URI.
-This allows namespace-prefixed elements in the \textbf{Original markups}
annotation set to be matched
-with JAPE expressions, and also allows namespace scope to be added to new
annotations
-when serialized to XML. See \ref{sec:corpora:input} for details.
+Setting these attributes will alter GATE's default namespace deserialization
+behaviour to remove the namespace prefix and add it as a feature, along with
+the namespace URI. This allows namespace-prefixed elements in the
+\textbf{Original markups} annotation set to be matched with JAPE expressions,
+and also allows namespace scope to be added to new annotations when serialized
+to XML. See \ref{sec:corpora:input} for details.
Searchable Serial Datastores (Lucene-based) are now portable and can be moved
across different systems. Also, several GUI improvements have been made to ease
-the creation of Lucene datastores. See \ref{chap:annic} for details.
+the creation of Lucene datastores. See chapter~\ref{chap:annic} for details.
-\rcSectNoLabel{March 2011}
-
-A new creole repository, Teamware\_Tools, contains a new PR called QA
Summariser
-for Teamware. When documents are annotated using Teamware, this PR can be used
-for generating a summary of agreements among annotators. It does this by
pairing
-individual annotators. It also compares each individual annotator's
annotations
-with those available in the consensus annotation set in the respective
documents.
-See Section \ref{sec:eval:qaForTW} for full details.
-
-A new creole repository, Tagger\_Measurements, contains a new PR for
annotating and
-normalizing measurements within a document. See Section
\ref{sec:misc-creole:measurements}
-for full details.
-
-A new creole repository, Tagger\_DateNormalizer, contains a new PR for
annotating
-and normalizing dates within a document. See Section
\ref{sec:misc-creole:datenormalizer}
-for full details.
-
The populate method that allowed populating corpus from a trecweb file has been
made more generic to accept a tag. The method extracts content between the
start
and end of this tag to create new documents. In GATE Developer, right-clicking
on an instance of the Corpus and choosing the option ``Populate from Single
Concatenated File" allows users to populate the corpus using this
functionality.
See Section \ref{sec:api:corpora} for more details.
-\rcSectNoLabel{February 2011}
-GATE now requires Java 6 or above.
-
Fixed a regression in the JAPE parser that prevented the use of RHS macros that
refer to a LHS label (named blocks \verb|:label { ... }| and assignments
\verb|:label.Type = {}|
-A new creole repository, Tagger\_Numbers, containing a number of PRs for
annotating
-numbers with their numeric value. See Section \ref{sec:misc-creole:numbers}
for full details.
-
-A new creole repository, Tagger\_Boilerpipe, which contains a
boilerpipe\footnote{\htlinkplain{http://code.google.com/p/boilerpipe/}}
-based PR for performing content detection. See Section
\ref{sec:misc-creole:boilerpipe} for full details.
-
-The Tagger\_MetaMap plugin has been rewritten to make use of the new MetaMap
Java API features.
-There are numerous performance enhancements and a bug fix where changes to the
\verb+metaMapOptions+
-run-time parameter were previously not enacted. The previous version of the
plugin has been moved to plugins/Obsolete.
-
-Please note that the updated Tagger\_MetaMap plugin
-is not parameter-compatible with the previous version, so your application
pipelines will need to be updated.
-Specifically, the following parameters have been removed: \verb+mmServerHost
+, \verb+mmServerPort +, \verb+mmServerTimeout +, \verb+excludeSemanticTypes+,
\verb+restrictSemanticTypes+, \verb+scoreThreshold+.
-
-These can now be specified using the \texttt{--metamap\_server\_host},
-\texttt{--metamap\_server\_port}, \texttt{--metamap\_server\_timeout},
- \texttt{-k}, \texttt{-J} and \texttt{-r} options in the
-\verb+metaMapOptions+ run-time parameter string.
-
-\begin{itemize}
-\item \verb+outputASType+ has been made a run-time parameter.
-\item \verb+useNegEx+ has been renamed to \verb+annotateNegEx+
-\end{itemize}
-
-The following changes have been made to \verb+outputMode+
-\begin{itemize}
-\item \verb+MappingsOnly+ has been renamed to \verb+AllMappings+
-\item \verb+CandidatesOnly+ has been renamed to \verb+AllCandidates+
-\item \verb+CandidatesAndMappings+ has been renamed to
\verb+AllCandidatesAndMappings+
-\end{itemize}
-
-In addition, new parameters have been added. See Section
\ref{sec:misc-creole:metamap} for full details.
-
-\rcSectNoLabel{January 2011}
-
-Added a new Schema Enforcer PR that can be used to create a `clean' output
annotation set based
-on a set of annotation schemas. See Section
\ref{sec:misc-creole:schemaenforcer} for full details.
-
Enhanced the Groovy scriptable controller with some features inspired by the
realtime controller, in particular the ability to ignore exceptions thrown by
PRs and the ability to limit the running time of certain PRs. See
@@ -116,153 +109,10 @@
A few bug fixes and improvements to the ``recover'' logic of the
\texttt{packagegapp} Ant task (see section~\ref{sec:ant:packagegapp}).
-\rcSectNoLabel{December 2010}
+\ldots and many other smaller bugfixes.
-Added support for handling controller events to JAPE by making it possible
-to define \texttt{ControllerStarted}, \texttt{ControllerFinished}, and
-\texttt{ControllerAborted} code blocks in a JAPE file (see
-section~\ref{sec:jape:javarhsoverview}).
+\textbf{Note: As of version 6.1, GATE Developer and Embedded require Java 6 or
+later and will no longer run on Java 5}. If you require Java 5 compatibility
+you should use GATE 6.0.
-JAPE Java right-hand-side code can now access an \texttt{ActionContext} object
-through the predefined field \texttt{ctx} which allows access to the corpus LR
-and the transducer PR and their features (see
section~\ref{sec:jape:javarhsoverview}).
-
-\rcSect[6.0]{Version 6.0 (November
2010)}\ifnested\label{subsec:changes:6.0b1}\else\label{sec:changes:6.0b1}\fi
-
-\rcSubsect{Major new features}
-
-Added an annotation tool for the document editor: the Relation Annotation
-Tool (RAT). It is designed to annotate a document with ontology instances
-and to create relations between annotations with ontology object
-properties. It is close and compatible with the Ontology Annotation Tool
-(OAT) but focus on relations between annotations. See
-section~\ref{sec:ontologies:rat} for details.
-
-Added a new \emph{scriptable controller} to the Groovy plugin, whose execution
-strategy is controlled by a simple Groovy DSL. This supports more powerful
-conditional execution than is possible with the standard conditional
-controllers (for example, based on the presence or absence of a particular
-annotation, or a combination of several document feature values), rich flow
-control using Groovy loops, etc. See
-section~\ref{sec:api:groovy:controller} for details.
-
-A new version of Alignment Editor has been added to the GATE distribution. It
-consists of several new features such as the new alignment viewer, ability to
-create alignment tasks and store in xml files, three different views to align
-the text (links view and matrix view - suitable for character, word and phrase
-alignments, parallel view - suitable for sentence or long text alignment), an
-alignment exporter and many more. See chapter \ref{chap:alignment} for more
-information.
-
-MetaMap, from the National Library of Medicine (NLM), maps biomedical text to
-the \textbf{UMLS Metathesaurus} and allows Metathesaurus concepts to be
-discovered in a text corpus. The Tagger\_MetaMap plugin for GATE wraps the
-MetaMap Java API client to allow GATE to communicate with a remote (or local)
-MetaMap PrologBeans \textbf{mmserver} and MetaMap distribution. This allows
the
-content of specified annotations (or the entire document content) to be
-processed by MetaMap and the results converted to GATE annotations and
features.
-See section~\ref{sec:misc-creole:metamap} for details.
-
-A new plugin called Web\_Translate\_Google has been added with a PR called
-Google Translator PR in it. It allows users to translate text using the
-Google translation services. See section
\ref{sec:misc-creole:google-translate}
-for more information.
-
-New Gazetteer Editor for ANNIE Gazetteer that can be used instead of
-Gaze. It uses tables instead of text area to display the gazetteer
-definition and lists, allows sorting on any column, filtering of the lists,
-reloading a list, etc. See section~\ref{sec:gazetteers:anniegazeditor}.
-
-\rcSubsect{Breaking changes}
-
-This release contains a few small changes that are not backwards-compatible:
-\begin{itemize}
-\item Changed the semantics of the ontology-aware matching mode in JAPE to take
-account of the default namespace in an ontology. Now {\tt class} feature
-values that are not complete URIs will be treated as naming classes within the
-default namespace of the target ontology only, and not (as previously) any
-class whose URI ends with the specified name. This is more consistent with the
-way OWL normally works, as well as being much more efficient to execute. See
-section~\ref{sec:ontologies:ontology-aware-jape} for more details.
-
-\item Updated the WordNet plugin to support more recent releases of WordNet
-than 1.6. The format of the configuration file has changed, if you are using
-the previous WordNet 1.6 support you will need to update your configuration.
-See section~\ref{sec:misc-creole:wn} for details.
-
-\item The deprecated Tagger\_TreeTagger plugin has been removed, applications
-that used it will need to be updated to use the Tagger\_Framework plugin
-instead. See section~\ref{sec:parsers:taggerframework} for details of how to
-do this.
-\end{itemize}
-
-\rcSubsect{Other new features and bugfixes}
-
-The concept of {\it templates} has been introduced to JAPE. This is a way to
-declare named ``variables'' in a JAPE grammar that can contain placeholders
-that are filled in when the template is referenced. See
-section~\ref{sec:jape:templates} for full details.
-
-Added a JAPE operator to get the string covered by a left-hand-side label and
-assign it to a feature of a new annotation on the right hand side (see
-section~\ref{sec:jape:metaproperties}).
-
-Added a new API to the CREOLE registry to permit plugins that live
-entirely on the classpath. {\tt CreoleRegister.registerComponent} instructs
-the registry to scan a single java Class for annotations, adding it to the
-set of registered plugins. See section~\ref{sec:api:plugins} for details.
-
-Maven artifacts for GATE are now published to the central Maven
-repository. See section~\ref{sec:gettingstarted:maven} for details.
-
-Bugfix: {\tt DocumentImpl} no longer changes its {\tt stringContent} parameter
-value whenever the document's content changes. Among other things, this means
-that saved application states will no longer contain the full text of the
-documents in their corpus, and documents containing XML or HTML tags that were
-originally created from string content (rather than a URL) can now safely be
-stored in saved application states and the GATE Developer saved session.
-
-A processing resource called Quality Assurance PR has been added in the Tools
-plugin. The PR wraps the functionality of the Quality Assurance Tool
-(section \ref{sec:eval:corpusqualityassurance}).
-
-A new section for using the Corpus Quality Assurance from GATE Embedded has
-been written. See section~\ref{sec:eval:corpusqualityassurance}.
-
-The Generic Tagger PR (in the Tagger\_Framework plugin) now allows more
-flexible specification of the input to the tagger, and is no longer limited to
-passing just the ``string'' feature from the input annotations. See
-section~\ref{sec:parsers:taggerframework} for details.
-
-Added new parameters and options to the LingPipe Language Identifier PR.
-(section~\ref{sec:misc-creole:lingpipe:langid}), and corrected the
-documentation for the LingPipe POS Tagger
-(section~\ref{sec:misc-creole:lingpipe:postagger}).
-
-In the document editor, fixed several exceptions to make editing text with
-annotations highlighted working. So you should now be able to edit the text
-and the annotations should behave correctly that is to say move, expand or
-disappear according to the text insertions and deletions.
-
-Options for document editor: read-only and insert append/prepend have been
-moved from the options dialogue to the document editor toolbar at the top
-right on the triangle icon that display a menu with the options. See
-section~\ref{sec:developer:documents}.
-
-Added new parameters and options to the Crawl PR and document features to its
-output; see section~\ref{sec:misc-creole:crawler} for details.
-
-Fixed a bug where ontology-aware JAPE rules worked correctly when the target
-annotation's class was a subclass of the class specified in the rule, but
-failed when the two class names matched exactly.
-
-Improved support for conditional pipelines containing non-LanguageAnalyser
-processing resources.
-
-Added the current {\tt Corpus} to the script binding for the Groovy Script PR,
-allowing a Groovy script to access and set corpus-level features. Also added
-callbacks that a Groovy script can implement to do additional pre- or
-post-processing before the first and after the last document in a corpus. See
-section~\ref{sec:api:groovy} for details.
-
% vim:ft=tex:
Property changes on: userguide/branches/release-6.1/recent-changes.tex
___________________________________________________________________
Modified: svn:mergeinfo
- /userguide/branches/release-5.1/recent-changes.tex:12029-12070
/userguide/branches/release-6.0/recent-changes.tex:13203-13218
/userguide/trunk/recent-changes.tex:10614-10900
+ /userguide/branches/release-5.1/recent-changes.tex:12029-12070
/userguide/branches/release-6.0/recent-changes.tex:13203-13218
/userguide/trunk/recent-changes.tex:10614-10900,13717,13720-13723
Modified: userguide/branches/release-6.1/tao_main.tex
===================================================================
--- userguide/branches/release-6.1/tao_main.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/tao_main.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -657,12 +657,11 @@
\input{uima} %final for book
\input{misc-creole} %final for book
-\ifprintedbook
+
\smartpart[part:family]{The GATE Family: Cloud, MIMIR, Teamware}
\input{cloud} %final for book
\input{teamware} %final for book
\input{mimir} %final for book
-\fi
Modified: userguide/branches/release-6.1/teamware.tex
===================================================================
--- userguide/branches/release-6.1/teamware.tex 2011-04-20 15:49:44 UTC (rev
13723)
+++ userguide/branches/release-6.1/teamware.tex 2011-04-20 15:54:41 UTC (rev
13724)
@@ -18,6 +18,13 @@
For technical and user interface details not covered in this chapter, please
refer to the
\htlink{http://gate.ac.uk/teamware/}{Teamware User Guide}.
+GATE Teamware is open-source software, released under the GNU Affero General
+Public Licence version 3. Commercial licences are available from the
+University of Sheffield. The source code is available from the subversion
+repository at
+
+{\tt https://gate.svn.sourceforge.net/svnroot/gate/teamware/trunk}
+
\section{Introduction}
For the past ten years, NLP development frameworks such as OpenNLP, GATE, and
UIMA have been providing tool support and facilitating NLP researchers with the
task of implementing new algorithms, sharing, and reusing them. At the same
time, Information Extraction (IE) research and computational linguistics in
general has been driven forward by the growing volume of annotated corpora,
produced by research projects and through evaluation initiatives such as MUC
\cite{Marsh98}, ACE\footnote{http://www.ldc.upenn.edu/Projects/ACE/}, DUC
\cite{DUC2001}, and CoNLL shared tasks. Some of the NLP frameworks (e.g., AGTK
\cite{Maeda04}, GATE \cite{Cun02b}) even provide text annotation user
interfaces. However, much more is needed in order to produce high quality
annotated corpora: a stringent methodology, annotation guidelines,
inter-annotator agreement measures, and in some cases, annotation adjudication
(or data curation) to reconcile differences between annotators.
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs