Revision: 15983
http://gate.svn.sourceforge.net/gate/?rev=15983&view=rev
Author: ian_roberts
Date: 2012-07-25 16:44:35 +0000 (Wed, 25 Jul 2012)
Log Message:
-----------
Added a "serialized object" output handler to save GATE documents as Java
serialized files, the same format as SerialDataStore.
Modified Paths:
--------------
gcp/trunk/.classpath
gcp/trunk/doc/batch-def.tex
gcp/trunk/doc/gcp-guide.pdf
Added Paths:
-----------
gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
Modified: gcp/trunk/.classpath
===================================================================
--- gcp/trunk/.classpath 2012-07-25 10:50:14 UTC (rev 15982)
+++ gcp/trunk/.classpath 2012-07-25 16:44:35 UTC (rev 15983)
@@ -7,6 +7,6 @@
<classpathentry kind="lib"
path="lib/fastutil-5.0.3-heritrix-subset-1.0.jar"/>
<classpathentry kind="lib" path="lib/heritrix-1.14.4.jar"/>
<classpathentry kind="lib" path="lib/mimir-client-4.0.jar"/>
- <classpathentry kind="con"
path="org.apache.ivyde.eclipse.cpcontainer.IVYDE_CONTAINER/?project=gcp&ivyXmlPath=build%2Fivy.xml&confs=*"/>
+ <classpathentry kind="con"
path="org.apache.ivyde.eclipse.cpcontainer.IVYDE_CONTAINER/?project=gcp&ivyXmlPath=build%2Fivy.xml&confs=*&ivySettingsPath=%24%7Bworkspace_loc%3Agcp%2Fbuild%2Fivysettings.xml%7D&loadSettingsOnDemand=false&propertyFiles="/>
<classpathentry kind="output" path="classes"/>
</classpath>
Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2012-07-25 10:50:14 UTC (rev 15982)
+++ gcp/trunk/doc/batch-def.tex 2012-07-25 16:44:35 UTC (rev 15983)
@@ -245,7 +245,7 @@
\subsection{File-based Output Handlers}
-GCP provides a set of four standard file-based output handlers to save data to
+GCP provides a set of five standard file-based output handlers to save data to
files on the filesystem in various formats.
\bit
@@ -260,13 +260,19 @@
\item \verb!gate.cloud.io.xces.XCESOutputHandler! to save annotations in the
XCES standoff format. Annotation offsets in XCES refer to the plain text as
saved by a \verb!PlainTextOutputHandler!.
+\item \verb!gate.cloid.io.file.SerializedObjectOutputHandler! to save documents
+ using Java's built in \emph{object serialization} protocol (with optional
+ compression). This handler ignores annotation filters, and always writes
+ the complete document. This is the same mechanism used by GATE's
+ \verb!SerialDataStore!.
\eit
-The four handlers share the following \verb!<output>! attributes:
+The five handlers share the following \verb!<output>! attributes:
\bde
-\item[encoding] (optional) The character encoding used when writing files. If
- omitted, ``UTF-8'' is the default.
+\item[encoding] (optional, not applicable to
+ \verb!SerializedObjectOutputHandler!) The character encoding used when
+ writing files. If omitted, ``UTF-8'' is the default.
\item[compression] (optional) The compression algorithm to apply to the saved
files. Can be either ``none'' (no compression, the default) or ``gzip''
(GZIP compression).
Modified: gcp/trunk/doc/gcp-guide.pdf
===================================================================
(Binary files differ)
Added: gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
===================================================================
--- gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
(rev 0)
+++ gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
2012-07-25 16:44:35 UTC (rev 15983)
@@ -0,0 +1,70 @@
+/*
+ * SerializedObjectOutputHandler.java
+ * Copyright (c) 2007-2012, The University of Sheffield.
+ *
+ * This file is part of GCP (see http://gate.ac.uk/), and is free
+ * software, licenced under the GNU Affero General Public License,
+ * Version 3, November 2007.
+ *
+ *
+ * $Id$
+ */
+package gate.cloud.io.file;
+
+import static gate.cloud.io.IOConstants.PARAM_ENCODING;
+import static gate.cloud.io.IOConstants.PARAM_FILE_EXTENSION;
+import gate.Document;
+import gate.cloud.io.OutputHandler;
+import gate.util.Benchmark;
+import gate.util.GateException;
+
+import java.io.IOException;
+import java.io.ObjectOutputStream;
+import java.util.Map;
+
+import org.apache.log4j.Logger;
+
+/**
+ * An {@link OutputHandler} that writes GATE Documents to files using
+ * Java serialization. The files may be optionally gzip compressed. Note
+ * that this always writes the complete document, any annotation type
+ * filters specified in the batch definition are ignored.
+ */
+public class SerializedObjectOutputHandler extends AbstractFileOutputHandler {
+
+ private static final Logger logger =
Logger.getLogger(SerializedObjectOutputHandler.class);
+
+ @Override
+ protected void configImpl(Map<String, String> configData) throws IOException,
+ GateException {
+ // make sure we default to .ser as the extension
+ if(!configData.containsKey(PARAM_FILE_EXTENSION)) {
+ configData.put(PARAM_FILE_EXTENSION, ".ser");
+ }
+ if(configData.containsKey(PARAM_ENCODING)) {
+ logger.warn(this.getClass().getName() + " does not support the "
+ + PARAM_ENCODING + " parameter - ignored");
+ }
+ super.configImpl(configData);
+ }
+
+ @Override
+ protected void outputDocumentImpl(Document document, String documentId)
+ throws IOException, GateException {
+ String baseBenchmarkID =
+ Benchmark.createBenchmarkId(document.getName(), documentId);
+
+ ObjectOutputStream outputStream =
+ new ObjectOutputStream(getFileOutputStream(documentId));
+ try {
+ String saveBID =
+ Benchmark.createBenchmarkId("saveSerialized", baseBenchmarkID);
+ long startTime = Benchmark.startPoint();
+ outputStream.writeObject(document);
+ Benchmark.checkPoint(startTime, saveBID, this, null);
+ } finally {
+ outputStream.close();
+ }
+ }
+
+}
Property changes on:
gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
___________________________________________________________________
Added: svn:keywords
+ Id
Added: svn:eol-style
+ native
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs