Revision: 17340
          http://sourceforge.net/p/gate/code/17340
Author:   adamfunk
Date:     2014-02-19 13:39:30 +0000 (Wed, 19 Feb 2014)
Log Message:
-----------
Twitter population dialog, explained at last

Modified Paths:
--------------
    userguide/trunk/misc-creole.tex

Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex     2014-02-19 11:50:50 UTC (rev 17339)
+++ userguide/trunk/misc-creole.tex     2014-02-19 13:39:30 UTC (rev 17340)
@@ -3212,8 +3212,42 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsect[sec:creole:population]{Corpus population from JSON files}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\ldots
-%
+Loading this plugin adds a ``Populate from Twitter JSON files'' option to the
+GATE Corpus right-click menu.  Selecting this option bringgs up a dialog that
+allows you to select one or more files of tweets in the Twitter API's JSON
+format and set the following options to populate the corpus.
+%%
+\begin{description}
+\item[Encoding]  The default here is UTF-8 (regardless of your Java default) to
+  conform to Twitter JSON.
+\item[One document per tweet] If this box is ticked, each tweet will produce a
+  separate document.  If not (the default), each input file will produce one
+  GATE document.
+\item[Content keys] The values of these JSON keys are converted into strings 
and
+  concatenated into each tweet's document content.  Colon-delimited strings
+  specify nested keys, e.g., ``\texttt{user:name}'' will yield the value of the
+  \texttt{name} key in the map that is the value of the \texttt{user} key.
+  Missing key sequences are ignored.  Each span of text will be covered by an
+  annotation whose type is the key sequence.
+\item[Feature keys] The key sequences and values of these JSON keys (where
+  present) are turned into feature names and values on the tweet's main
+  \texttt{Tweet} annotation.
+\item[Save configuration] This button saves the current options in an XML file
+  for re-use later.
+\item[Load configuration] This button sets the options according to a saved XML
+  configuration.
+\end{description}
+%%
+Every tweet is covered by a \texttt{Tweet} annotation with features specified 
by
+the ``feature keys'' option.  Multiple tweets in the same GATE document are
+separated by a blank line (two newlines).
+
+Corpus population from Twitter JSON files is also accessible programmatically
+when this plugin is loaded, using the public static void method
+\texttt{gate.corpora.twitter.Population.populateCorpus(final Corpus corpus, URL
+  inputUrl, String encoding, List<String> contentKeys, List<String> 
featureKeys,
+  int tweetsPerDoc)}.
+%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \sect[sec:creole:termraider]{TermRaider term extraction tools}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to