Revision: 18733
http://sourceforge.net/p/gate/code/18733
Author: ian_roberts
Date: 2015-06-01 15:29:52 +0000 (Mon, 01 Jun 2015)
Log Message:
-----------
Added cross-reference to the additional format plugins from the same place as
the list of standard formats
Modified Paths:
--------------
userguide/trunk/corpora.tex
userguide/trunk/misc-creole.tex
Modified: userguide/trunk/corpora.tex
===================================================================
--- userguide/trunk/corpora.tex 2015-06-01 01:20:16 UTC (rev 18732)
+++ userguide/trunk/corpora.tex 2015-06-01 15:29:52 UTC (rev 18733)
@@ -465,7 +465,7 @@
\sect[sec:corpora:formats]{Document Formats}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-The following document formats are supported by GATE:
+The following document formats are supported by GATE by default:
\begin{itemize}
\item
Plain Text
@@ -486,11 +486,32 @@
\item
OpenOffice (some formats)
\item
-UIMA CAS
+UIMA CAS XML format
\item
CoNLL/IOB
\end{itemize}
+Additional formats are provided by plugins -- you must load the relevant
+plugin before attempting to parse these document types
+\begin{itemize}
+\item Twitter JSON (in the {\tt Twitter} plugin, see
+ section~\ref{sec:social:twitter:format})
+\item DataSift JSON, a common format for social media data from
+ \htlinkplain{http://datasift.com} (in the {\tt Format\_DataSift} plugin, see
+ section~\ref{sec:creole:datasift})
+\item FastInfoset, a compressed binary encoding of GATE XML (in the
+ {\tt Format\_FastInfoset} plugin, see section~\ref{sec:creole:fastinfoset})
+\item MediaWiki markup, as used by Wikipedia and many other public wiki sites
+ (in the {\tt Format\_MediaWiki} plugin, see
+ section~\ref{sec:creole:mediawiki})
+\item The formats used by PubMed and the Cochrane collaboration for biomedical
+ literature (in the {\tt Format\_PubMed} plugin, see
+ section~\ref{sec:creole:pubmed})
+\item CSV files containing one column of text data and optionally additional
+ columns of metadata (in the {\tt Format\_CSV} plugin, see
+ section~\ref{sec:creole:csv})
+\end{itemize}
+
By default GATE will try and identify the type of the document, then strip
and convert any markup into GATE's annotation format. To disable this
process, set the {\tt markupAware} parameter on the document to {\tt false}.
Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex 2015-06-01 01:20:16 UTC (rev 18732)
+++ userguide/trunk/misc-creole.tex 2015-06-01 15:29:52 UTC (rev 18733)
@@ -3271,8 +3271,8 @@
within GATE Developer by choosing the "Save as Fast Infoset XML" option from
the right-click menu of the relevant corpus or document.
-A GCP\footnote{\url{https://gate.svn.sourceforge.net/svnroot/gate/gcp/trunk
-}} output handler is also provided by the \verb!Format_FastInfoset! plugin.
+A GCP\footnote{\url{http://svn.code.sf.net/p/gate/code/gcp/trunk}} output
+handler is also provided by the \verb!Format_FastInfoset! plugin.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:creole:datasift]{DataSift Document Format}
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs