Revision: 18733
          http://sourceforge.net/p/gate/code/18733
Author:   ian_roberts
Date:     2015-06-01 15:29:52 +0000 (Mon, 01 Jun 2015)
Log Message:
-----------
Added cross-reference to the additional format plugins from the same place as 
the list of standard formats

Modified Paths:
--------------
    userguide/trunk/corpora.tex
    userguide/trunk/misc-creole.tex

Modified: userguide/trunk/corpora.tex
===================================================================
--- userguide/trunk/corpora.tex 2015-06-01 01:20:16 UTC (rev 18732)
+++ userguide/trunk/corpora.tex 2015-06-01 15:29:52 UTC (rev 18733)
@@ -465,7 +465,7 @@
 \sect[sec:corpora:formats]{Document Formats}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
-The following document formats are supported by GATE:
+The following document formats are supported by GATE by default:
 \begin{itemize}
 \item
 Plain Text
@@ -486,11 +486,32 @@
 \item
 OpenOffice (some formats)
 \item
-UIMA CAS
+UIMA CAS XML format
 \item
 CoNLL/IOB
 \end{itemize}
 
+Additional formats are provided by plugins -- you must load the relevant
+plugin before attempting to parse these document types
+\begin{itemize}
+\item Twitter JSON (in the {\tt Twitter} plugin, see
+  section~\ref{sec:social:twitter:format})
+\item DataSift JSON, a common format for social media data from
+  \htlinkplain{http://datasift.com} (in the {\tt Format\_DataSift} plugin, see
+  section~\ref{sec:creole:datasift})
+\item FastInfoset, a compressed binary encoding of GATE XML (in the
+  {\tt Format\_FastInfoset} plugin, see section~\ref{sec:creole:fastinfoset})
+\item MediaWiki markup, as used by Wikipedia and many other public wiki sites
+  (in the {\tt Format\_MediaWiki} plugin, see
+  section~\ref{sec:creole:mediawiki})
+\item The formats used by PubMed and the Cochrane collaboration for biomedical
+  literature (in the {\tt Format\_PubMed} plugin, see
+  section~\ref{sec:creole:pubmed})
+\item CSV files containing one column of text data and optionally additional
+  columns of metadata (in the {\tt Format\_CSV} plugin, see
+  section~\ref{sec:creole:csv})
+\end{itemize}
+
 By default GATE will try and identify the type of the document, then strip
 and convert any markup into GATE's annotation format. To disable this
 process, set the {\tt markupAware} parameter on the document to {\tt false}.

Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex     2015-06-01 01:20:16 UTC (rev 18732)
+++ userguide/trunk/misc-creole.tex     2015-06-01 15:29:52 UTC (rev 18733)
@@ -3271,8 +3271,8 @@
 within GATE Developer by choosing the "Save as Fast Infoset XML" option from
 the right-click menu of the relevant corpus or document.
 
-A GCP\footnote{\url{https://gate.svn.sourceforge.net/svnroot/gate/gcp/trunk
-}} output handler is also provided by the \verb!Format_FastInfoset! plugin.
+A GCP\footnote{\url{http://svn.code.sf.net/p/gate/code/gcp/trunk}} output
+handler is also provided by the \verb!Format_FastInfoset! plugin.
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \sect[sec:creole:datasift]{DataSift Document Format}

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to