Revision: 17348
http://sourceforge.net/p/gate/code/17348
Author: adamfunk
Date: 2014-02-19 17:05:15 +0000 (Wed, 19 Feb 2014)
Log Message:
-----------
Partial documentation of the PMI bank.
Modified Paths:
--------------
userguide/trunk/misc-creole.tex
userguide/trunk/recent-changes.tex
Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex 2014-02-19 17:03:32 UTC (rev 17347)
+++ userguide/trunk/misc-creole.tex 2014-02-19 17:05:15 UTC (rev 17348)
@@ -3254,15 +3254,16 @@
TermRaider is a set of term extraction and scoring tools developed in the NeOn
and ARCOMEM projects. Although the plugin is still experimental, we are now
including it in GATE as a response to frequent requests from GATE users who
have
-read publications related to those projects. Please note that although the
-TermRaider GUI and API are themselves fairly stable, they are subject to change
-and the output formats are unstable.
+read publications related to those projects.
The easiest way to test TermRaider is to populate a corpus with related
documents, load the sample
application (\texttt{plugins/TermRaider/applications/termraider-eng.gapp}),
and run it. This application will process the documents and create instances
of
three termbank language resources with sensible parameters.
+
+All the language resources in TermRaider are properly serializable and so can
be
+stored in GATE datastores.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsect{Termbank language resources}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -3376,12 +3377,34 @@
\emph{docFrequencyFeature} unless that is blank. Note that the default values
are not blank---you need to clear either or both parameters to prevent these
annotation features from being filled in.
-%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% OBVIOUSLY I'm not finished documenting this.
-% -- Adam
+\subsect{The PMI bank language resource}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+Like termbanks, the \emph{PMI Bank} is a GATE language resource derived from
+annotations on one or more GATE corpora. The PMI Bank, however, works on
+\emph{collocations}---pairs of ``inner'' annotations (e.g., \emph{Token} or
+named entity types) within a sliding window defined as a number of ``outer''
+annotations (usually 1 or 2 \emph{Sentence} annotations).
+The documents need to be processed to create the required inner and outer
+annotations, as shown in the \texttt{pmi-example.gapp} sample application
+provided in this plugin. The PMI Bank can then be created with the following
+init parameters.
+%%
+\begin{description}
+\item[allowOverlapCollocations] default \texttt{false}
+\item[corpora]
+\item[debugMode] default \texttt{false}
+\item[innerAnnotationTypes] default \verb![Entity]!
+% TODO change the default to something more sensible
+\item[inputASName]
+\item[inputAnnotationFeature] default \texttt{canonical}
+\item[languageFeature] default \texttt{lang}
+\item[outerAnnotationType] default \texttt{Sentence}
+\item[outerAnnotationWindow] default \texttt{2}
+\item[requireTypeDifference] default \texttt{false}
+\item[scoreProperty] default \texttt{pmiScore}
+\end{description}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:misc-creole:doc-normalizer]{Document Normalizer}
Modified: userguide/trunk/recent-changes.tex
===================================================================
--- userguide/trunk/recent-changes.tex 2014-02-19 17:03:32 UTC (rev 17347)
+++ userguide/trunk/recent-changes.tex 2014-02-19 17:05:15 UTC (rev 17348)
@@ -21,6 +21,11 @@
\rcSect[next-release]{Next Release}
+\rcSubsect{February 2014}
+
+The Twitter JSON document format and corpus population tool are now documented
+in the Twitter plugin (Section~\ref{sec:creole:tweet}).
+
\rcSubsect{January 2014}
A new plugin that allows for document normalization has been added. This
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs