[1/2] opennlp-sandbox git commit: OPENNLP-1009 - added initial RNN and StackedRNN impls from Yay lab, minor fixes

tommaso Tue, 09 May 2017 07:41:03 -0700

Repository: opennlp-sandbox
Updated Branches:
  refs/heads/master 96c088b00 -> 6bfb15f07



OPENNLP-1009 - added initial RNN and StackedRNN impls from Yay lab, minor fixes


Project: http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/repo
Commit: http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/commit/a63ec16c
Tree: http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/tree/a63ec16c
Diff: http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/diff/a63ec16c

Branch: refs/heads/master
Commit: a63ec16ccd602d5e1c468d2b3370caaea3fb5f3f
Parents: 96c088b
Author: Tommaso Teofili <[email protected]>
Authored: Mon May 8 14:59:33 2017 +0200
Committer: Tommaso Teofili <[email protected]>
Committed: Mon May 8 14:59:33 2017 +0200

----------------------------------------------------------------------
 opennlp-dl/pom.xml                              | 136 ++++---
 opennlp-dl/src/main/java/NameFinderDL.java      | 232 ------------
 .../main/java/NameSampleDataSetIterator.java    | 225 ------------
 .../java/opennlp/tools/dl/NameFinderDL.java     | 232 ++++++++++++
 .../tools/dl/NameSampleDataSetIterator.java     | 225 ++++++++++++
 .../src/main/java/opennlp/tools/dl/RNN.java     | 366 +++++++++++++++++++
 .../main/java/opennlp/tools/dl/StackedRNN.java  | 323 ++++++++++++++++
 .../src/test/java/opennlp/tools/dl/RNNTest.java | 110 ++++++
 .../java/opennlp/tools/dl/StackedRNNTest.java   | 109 ++++++
 .../src/test/resources/text/sentences.txt       |  52 +++
 10 files changed, 1498 insertions(+), 512 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/pom.xml
----------------------------------------------------------------------
diff --git a/opennlp-dl/pom.xml b/opennlp-dl/pom.xml
index f8a6679..3d15d8f 100644
--- a/opennlp-dl/pom.xml
+++ b/opennlp-dl/pom.xml
@@ -1,56 +1,82 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<project xmlns="http://maven.apache.org/POM/4.0.0";
-         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
-    <modelVersion>4.0.0</modelVersion>
-
-    <groupId>burn</groupId>
-    <artifactId>dl4jtest</artifactId>
-    <version>1.0-SNAPSHOT</version>
-
-    <dependencies>
-        <dependency>
-            <groupId>org.apache.opennlp</groupId>
-            <artifactId>opennlp-tools</artifactId>
-            <version>1.7.2</version>
-        </dependency>
-
-        <dependency>
-            <groupId>org.deeplearning4j</groupId>
-            <artifactId>deeplearning4j-core</artifactId>
-            <version>0.7.2</version>
-        </dependency>
-
-        <dependency>
-            <groupId>org.nd4j</groupId>
-            <artifactId>nd4j-native-platform</artifactId>
-            <!-- artifactId>nd4j-cuda-8.0-platform</artifactId -->
-            <version>0.7.2</version>
-        </dependency>
-
-        <dependency>
-            <groupId>org.deeplearning4j</groupId>
-            <artifactId>deeplearning4j-nlp</artifactId>
-            <version>0.7.2</version>
-        </dependency>
-        <dependency>
-            <groupId>org.slf4j</groupId>
-            <artifactId>slf4j-simple</artifactId>
-            <version>1.7.12</version>
-        </dependency>
-    </dependencies>
-
-    <build>
-        <plugins>
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-compiler-plugin</artifactId>
-                <version>3.5.1</version>
-                <configuration>
-                    <source>1.8</source>
-                    <target>1.8</target>
-                </configuration>
-            </plugin>
-        </plugins>
-    </build>
-</project>
+  <modelVersion>4.0.0</modelVersion>
+
+  <groupId>org.apache.opennlp</groupId>
+  <artifactId>opennlp-dl</artifactId>
+  <version>0.1-SNAPSHOT</version>
+
+  <properties>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <nd4j.version>0.7.2</nd4j.version>
+  </properties>
+
+  <dependencies>
+      <dependency>
+          <groupId>org.apache.opennlp</groupId>
+          <artifactId>opennlp-tools</artifactId>
+          <version>1.7.2</version>
+      </dependency>
+
+      <dependency>
+          <groupId>org.deeplearning4j</groupId>
+          <artifactId>deeplearning4j-core</artifactId>
+          <version>${nd4j.version}</version>
+      </dependency>
+
+
+      <dependency>
+          <groupId>org.deeplearning4j</groupId>
+          <artifactId>deeplearning4j-nlp</artifactId>
+          <version>${nd4j.version}</version>
+      </dependency>
+      <dependency>
+          <groupId>org.slf4j</groupId>
+          <artifactId>slf4j-simple</artifactId>
+          <version>1.7.12</version>
+      </dependency>
+    <dependency>
+      <groupId>junit</groupId>
+      <artifactId>junit</artifactId>
+      <version>4.11</version>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.nd4j</groupId>
+      <artifactId>nd4j-native-platform</artifactId>
+      <version>${nd4j.version}</version>
+    </dependency>
+  </dependencies>
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-compiler-plugin</artifactId>
+        <version>2.0.2</version>
+        <configuration>
+          <source>1.8</source>
+          <target>1.8</target>
+          <encoding>UTF-8</encoding>
+        </configuration>
+      </plugin>
+    </plugins>
+  </build>
+</project>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/NameFinderDL.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/main/java/NameFinderDL.java 
b/opennlp-dl/src/main/java/NameFinderDL.java
deleted file mode 100644
index 1184a06..0000000
--- a/opennlp-dl/src/main/java/NameFinderDL.java
+++ /dev/null
@@ -1,232 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.io.File;
-import java.io.IOException;
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.List;
-import java.util.Map;
-import java.util.stream.Collectors;
-import java.util.stream.IntStream;
-
-import org.deeplearning4j.models.embeddings.loader.WordVectorSerializer;
-import org.deeplearning4j.models.embeddings.wordvectors.WordVectors;
-import org.deeplearning4j.nn.api.OptimizationAlgorithm;
-import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
-import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
-import org.deeplearning4j.nn.conf.Updater;
-import org.deeplearning4j.nn.conf.layers.GravesLSTM;
-import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
-import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
-import org.deeplearning4j.nn.weights.WeightInit;
-import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
-import org.nd4j.linalg.activations.Activation;
-import org.nd4j.linalg.api.ndarray.INDArray;
-import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
-import org.nd4j.linalg.factory.Nd4j;
-import org.nd4j.linalg.indexing.INDArrayIndex;
-import org.nd4j.linalg.indexing.NDArrayIndex;
-import org.nd4j.linalg.lossfunctions.LossFunctions;
-
-import opennlp.tools.namefind.BioCodec;
-import opennlp.tools.namefind.NameSample;
-import opennlp.tools.namefind.NameSampleDataStream;
-import opennlp.tools.namefind.TokenNameFinder;
-import opennlp.tools.namefind.TokenNameFinderEvaluator;
-import opennlp.tools.util.MarkableFileInputStreamFactory;
-import opennlp.tools.util.ObjectStream;
-import opennlp.tools.util.PlainTextByLineStream;
-import opennlp.tools.util.Span;
-
-// 
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java
-public class NameFinderDL implements TokenNameFinder {
-
-  private final MultiLayerNetwork network;
-  private final WordVectors wordVectors;
-  private int windowSize;
-  private String[] labels;
-
-  public NameFinderDL(MultiLayerNetwork network, WordVectors wordVectors, int 
windowSize,
-                      String[] labels) {
-    this.network = network;
-    this.wordVectors = wordVectors;
-    this.windowSize = windowSize;
-    this.labels = labels;
-  }
-
-  static List<INDArray> mapToFeatureMatrices(WordVectors wordVectors, String[] 
tokens, int windowSize) {
-
-    List<INDArray> matrices = new ArrayList<>();
-
-    // TODO: Dont' hard code word vector dimension ...
-
-    for (int i = 0; i < tokens.length; i++) {
-      INDArray features = Nd4j.create(1, 300, windowSize);
-      for (int vectorIndex = 0; vectorIndex < windowSize; vectorIndex++) {
-        int tokenIndex = i + vectorIndex - ((windowSize - 1) / 2);
-        if (tokenIndex >= 0 && tokenIndex < tokens.length) {
-          String token = tokens[tokenIndex];
-          double[] wv = wordVectors.getWordVector(token);
-          if (wv != null) {
-            INDArray vector = wordVectors.getWordVectorMatrix(token);
-            features.put(new INDArrayIndex[]{NDArrayIndex.point(0), 
NDArrayIndex.all(),
-                NDArrayIndex.point(vectorIndex)}, vector);
-          }
-        }
-      }
-      matrices.add(features);
-    }
-
-    return matrices;
-  }
-
-  static List<INDArray> mapToLabelVectors(NameSample sample, int windowSize, 
String[] labelStrings) {
-
-    Map<String, Integer> labelToIndex = IntStream.range(0, 
labelStrings.length).boxed()
-        .collect(Collectors.toMap(i -> labelStrings[i], i -> i));
-
-    List<INDArray> vectors = new ArrayList<INDArray>();
-
-    for (int i = 0; i < sample.getSentence().length; i++) {
-      // encode the outcome as one-hot-representation
-      String outcomes[] =
-          new BioCodec().encode(sample.getNames(), 
sample.getSentence().length);
-
-      INDArray labels = Nd4j.create(1, labelStrings.length, windowSize);
-      labels.putScalar(new int[]{0, labelToIndex.get(outcomes[i]), windowSize 
- 1}, 1.0d);
-      vectors.add(labels);
-    }
-
-    return vectors;
-  }
-
-  private static int max(INDArray array) {
-    int best = 0;
-    for (int i = 0; i < array.size(0); i++) {
-      if (array.getDouble(i) > array.getDouble(best)) {
-        best = i;
-      }
-    }
-    return  best;
-  }
-
-  @Override
-  public Span[] find(String[] tokens) {
-    List<INDArray> featureMartrices = mapToFeatureMatrices(wordVectors, 
tokens, windowSize);
-
-    String[] outcomes = new String[tokens.length];
-    for (int i = 0; i < tokens.length; i++) {
-      INDArray predictionMatrix = network.output(featureMartrices.get(i), 
false);
-      INDArray outcomeVector = predictionMatrix.get(NDArrayIndex.point(0), 
NDArrayIndex.all(),
-          NDArrayIndex.point(windowSize - 1));
-
-      outcomes[i] = labels[max(outcomeVector)];
-    }
-
-    // Delete invalid spans ...
-    for (int i = 0; i < outcomes.length; i++) {
-      if (outcomes[i].endsWith("cont") && (i == 0 || "other".equals(outcomes[i 
- 1]))) {
-        outcomes[i] = "other";
-      }
-    }
-
-    return new BioCodec().decode(Arrays.asList(outcomes));
-  }
-
-  @Override
-  public void clearAdaptiveData() {
-  }
-
-  public static MultiLayerNetwork train(WordVectors wordVectors, 
ObjectStream<NameSample> samples,
-                                        int epochs, int windowSize, String[] 
labels) throws IOException {
-    int vectorSize = 300;
-    int layerSize = 256;
-
-    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
-        
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
-        .updater(Updater.RMSPROP)
-        .regularization(true).l2(0.001)
-        .weightInit(WeightInit.XAVIER)
-        // 
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
-        .learningRate(0.01)
-        .list()
-        .layer(0, new GravesLSTM.Builder().nIn(vectorSize).nOut(layerSize)
-            .activation(Activation.TANH).build())
-        .layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX)
-            
.lossFunction(LossFunctions.LossFunction.MCXENT).nIn(layerSize).nOut(3).build())
-        .pretrain(false).backprop(true).build();
-
-    MultiLayerNetwork net = new MultiLayerNetwork(conf);
-    net.init();
-    net.setListeners(new ScoreIterationListener(5));
-
-    // TODO: Extract labels on the fly from the data
-
-    DataSetIterator train = new NameSampleDataSetIterator(samples, 
wordVectors, windowSize, labels);
-
-    System.out.println("Starting training");
-
-    for (int i = 0; i < epochs; i++) {
-      net.fit(train);
-      train.reset();
-      System.out.println(String.format("Finished epoche %d", i));
-    }
-
-    return net;
-  }
-
-  public static void main(String[] args) throws Exception {
-    if (args.length != 3) {
-      System.out.println("Usage: trainFile testFile gloveTxt");
-      return;
-    }
-
-    String[] labels = new String[] {
-        "default-start", "default-cont", "other"
-    };
-
-    System.out.print("Loading vectors ... ");
-    WordVectors wordVectors = WordVectorSerializer.loadTxtVectors(
-        new File(args[2]));
-    System.out.println("Done");
-
-    int windowSize = 5;
-
-    MultiLayerNetwork net = train(wordVectors, new NameSampleDataStream(new 
PlainTextByLineStream(
-        new MarkableFileInputStreamFactory(new File(args[0])), 
StandardCharsets.UTF_8)), 1, windowSize, labels);
-
-    ObjectStream<NameSample> evalStream = new NameSampleDataStream(new 
PlainTextByLineStream(
-        new MarkableFileInputStreamFactory(
-            new File(args[1])), StandardCharsets.UTF_8));
-
-    NameFinderDL nameFinder = new NameFinderDL(net, wordVectors, windowSize, 
labels);
-
-    System.out.print("Evaluating ... ");
-    TokenNameFinderEvaluator nameFinderEvaluator = new 
TokenNameFinderEvaluator(nameFinder);
-    nameFinderEvaluator.evaluate(evalStream);
-
-    System.out.println("Done");
-
-    System.out.println();
-    System.out.println();
-    System.out.println("Results");
-
-    System.out.println(nameFinderEvaluator.getFMeasure().toString());
-  }
-}

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/NameSampleDataSetIterator.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/main/java/NameSampleDataSetIterator.java 
b/opennlp-dl/src/main/java/NameSampleDataSetIterator.java
deleted file mode 100644
index f416a1d..0000000
--- a/opennlp-dl/src/main/java/NameSampleDataSetIterator.java
+++ /dev/null
@@ -1,225 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.Iterator;
-import java.util.List;
-import java.util.NoSuchElementException;
-
-import org.deeplearning4j.models.embeddings.wordvectors.WordVectors;
-import org.nd4j.linalg.api.ndarray.INDArray;
-import org.nd4j.linalg.dataset.DataSet;
-import org.nd4j.linalg.dataset.api.DataSetPreProcessor;
-import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
-import org.nd4j.linalg.factory.Nd4j;
-import org.nd4j.linalg.indexing.INDArrayIndex;
-import org.nd4j.linalg.indexing.NDArrayIndex;
-
-import opennlp.tools.namefind.NameSample;
-import opennlp.tools.util.FilterObjectStream;
-import opennlp.tools.util.ObjectStream;
-
-public class NameSampleDataSetIterator implements DataSetIterator {
-
-  private static class NameSampleToDataSetStream extends 
FilterObjectStream<NameSample, DataSet> {
-
-    private final WordVectors wordVectors;
-    private final String[] labels;
-    private int windowSize;
-
-    private Iterator<DataSet> dataSets = Collections.emptyListIterator();
-
-    NameSampleToDataSetStream(ObjectStream<NameSample> samples, WordVectors 
wordVectors, int windowSize, String[] labels) {
-      super(samples);
-      this.wordVectors = wordVectors;
-      this.windowSize = windowSize;
-      this.labels = labels;
-    }
-
-    private Iterator<DataSet> createDataSets(NameSample sample) {
-      List<INDArray> features = NameFinderDL.mapToFeatureMatrices(wordVectors, 
sample.getSentence(),
-          windowSize);
-
-      List<INDArray> labels = NameFinderDL.mapToLabelVectors(sample, 
windowSize, this.labels);
-
-      List<DataSet> dataSetList = new ArrayList<>();
-
-      for (int i = 0; i < features.size(); i++) {
-        dataSetList.add(new DataSet(features.get(i), labels.get(i)));
-      }
-
-      return dataSetList.iterator();
-    }
-
-    @Override
-    public final DataSet read() throws IOException {
-
-      if (dataSets.hasNext()) {
-        return dataSets.next();
-      }
-      else {
-        NameSample sample;
-        while (!dataSets.hasNext() && (sample = samples.read()) != null) {
-          dataSets = createDataSets(sample);
-        }
-
-        if (dataSets.hasNext()) {
-          return read();
-        }
-      }
-
-      return null;
-    }
-  }
-
-  private final int windowSize;
-  private final String[] labels;
-
-  private final int batchSize = 128;
-  private final int vectorSize = 300;
-
-  private final int totalSamples;
-
-  private int cursor = 0;
-
-  private final ObjectStream<DataSet> samples;
-
-  NameSampleDataSetIterator(ObjectStream<NameSample> samples, WordVectors 
wordVectors, int windowSize,
-                            String labels[]) throws IOException {
-    this.windowSize = windowSize;
-    this.labels = labels;
-
-    this.samples = new NameSampleToDataSetStream(samples, wordVectors, 
windowSize, labels);
-
-    int total = 0;
-
-    DataSet sample;
-    while ((sample = this.samples.read()) != null) {
-      total++;
-    }
-
-    totalSamples = total;
-
-    samples.reset();
-  }
-
-  public DataSet next(int num) {
-    if (cursor >= totalExamples()) throw new NoSuchElementException();
-
-    INDArray features = Nd4j.create(num, vectorSize, windowSize);
-    INDArray featuresMask = Nd4j.zeros(num, windowSize);
-
-    INDArray labels = Nd4j.create(num, 3, windowSize);
-    INDArray labelsMask = Nd4j.zeros(num, windowSize);
-
-    // iterate stream and copy to arrays
-
-    for (int i = 0; i < num; i++) {
-      DataSet sample;
-      try {
-        sample = samples.read();
-      } catch (IOException e) {
-        throw new RuntimeException(e);
-      }
-
-      if (sample != null) {
-        INDArray feature = sample.getFeatureMatrix();
-        features.put(new INDArrayIndex[] {NDArrayIndex.point(i)}, 
feature.get(NDArrayIndex.point(0)));
-
-        feature.get(new INDArrayIndex[] {NDArrayIndex.point(0), 
NDArrayIndex.all(),
-            NDArrayIndex.point(0)});
-
-        for (int j = 0; j < windowSize; j++) {
-          featuresMask.putScalar(new int[] {i, j}, 1.0);
-        }
-
-        INDArray label = sample.getLabels();
-        labels.put(new INDArrayIndex[] {NDArrayIndex.point(i)}, 
label.get(NDArrayIndex.point(0)));
-        labelsMask.putScalar(new int[] {i, windowSize - 1}, 1.0);
-      }
-
-      cursor++;
-    }
-
-    return new DataSet(features, labels, featuresMask, labelsMask);
-  }
-
-  public int totalExamples() {
-    return totalSamples;
-  }
-
-  public int inputColumns() {
-    return vectorSize;
-  }
-
-  public int totalOutcomes() {
-    return getLabels().size();
-  }
-
-  public boolean resetSupported() {
-    return true;
-  }
-
-  public boolean asyncSupported() {
-    return false;
-  }
-
-  public void reset() {
-    cursor = 0;
-
-    try {
-      samples.reset();
-    } catch (IOException e) {
-      throw new RuntimeException(e);
-    }
-  }
-
-  public int batch() {
-    return batchSize;
-  }
-
-  public int cursor() {
-    return cursor;
-  }
-
-  public int numExamples() {
-    return totalExamples();
-  }
-
-  public void setPreProcessor(DataSetPreProcessor dataSetPreProcessor) {
-    throw new UnsupportedOperationException();
-  }
-
-  public DataSetPreProcessor getPreProcessor() {
-    throw new UnsupportedOperationException();
-  }
-
-  public List<String> getLabels() {
-    return Arrays.asList("start","cont", "other");
-  }
-
-  public boolean hasNext() {
-    return cursor < numExamples();
-  }
-
-  public DataSet next() {
-    return next(batchSize);
-  }
-}

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/opennlp/tools/dl/NameFinderDL.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/main/java/opennlp/tools/dl/NameFinderDL.java 
b/opennlp-dl/src/main/java/opennlp/tools/dl/NameFinderDL.java
new file mode 100644
index 0000000..7547196
--- /dev/null
+++ b/opennlp-dl/src/main/java/opennlp/tools/dl/NameFinderDL.java
@@ -0,0 +1,232 @@
+package opennlp.tools.dl;/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import org.deeplearning4j.models.embeddings.loader.WordVectorSerializer;
+import org.deeplearning4j.models.embeddings.wordvectors.WordVectors;
+import org.deeplearning4j.nn.api.OptimizationAlgorithm;
+import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
+import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
+import org.deeplearning4j.nn.conf.Updater;
+import org.deeplearning4j.nn.conf.layers.GravesLSTM;
+import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
+import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
+import org.deeplearning4j.nn.weights.WeightInit;
+import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
+import org.nd4j.linalg.activations.Activation;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
+import org.nd4j.linalg.factory.Nd4j;
+import org.nd4j.linalg.indexing.INDArrayIndex;
+import org.nd4j.linalg.indexing.NDArrayIndex;
+import org.nd4j.linalg.lossfunctions.LossFunctions;
+
+import opennlp.tools.namefind.BioCodec;
+import opennlp.tools.namefind.NameSample;
+import opennlp.tools.namefind.NameSampleDataStream;
+import opennlp.tools.namefind.TokenNameFinder;
+import opennlp.tools.namefind.TokenNameFinderEvaluator;
+import opennlp.tools.util.MarkableFileInputStreamFactory;
+import opennlp.tools.util.ObjectStream;
+import opennlp.tools.util.PlainTextByLineStream;
+import opennlp.tools.util.Span;
+
+// 
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java
+public class NameFinderDL implements TokenNameFinder {
+
+  private final MultiLayerNetwork network;
+  private final WordVectors wordVectors;
+  private int windowSize;
+  private String[] labels;
+
+  public NameFinderDL(MultiLayerNetwork network, WordVectors wordVectors, int 
windowSize,
+                      String[] labels) {
+    this.network = network;
+    this.wordVectors = wordVectors;
+    this.windowSize = windowSize;
+    this.labels = labels;
+  }
+
+  static List<INDArray> mapToFeatureMatrices(WordVectors wordVectors, String[] 
tokens, int windowSize) {
+
+    List<INDArray> matrices = new ArrayList<>();
+
+    // TODO: Dont' hard code word vector dimension ...
+
+    for (int i = 0; i < tokens.length; i++) {
+      INDArray features = Nd4j.create(1, 300, windowSize);
+      for (int vectorIndex = 0; vectorIndex < windowSize; vectorIndex++) {
+        int tokenIndex = i + vectorIndex - ((windowSize - 1) / 2);
+        if (tokenIndex >= 0 && tokenIndex < tokens.length) {
+          String token = tokens[tokenIndex];
+          double[] wv = wordVectors.getWordVector(token);
+          if (wv != null) {
+            INDArray vector = wordVectors.getWordVectorMatrix(token);
+            features.put(new INDArrayIndex[]{NDArrayIndex.point(0), 
NDArrayIndex.all(),
+                NDArrayIndex.point(vectorIndex)}, vector);
+          }
+        }
+      }
+      matrices.add(features);
+    }
+
+    return matrices;
+  }
+
+  static List<INDArray> mapToLabelVectors(NameSample sample, int windowSize, 
String[] labelStrings) {
+
+    Map<String, Integer> labelToIndex = IntStream.range(0, 
labelStrings.length).boxed()
+        .collect(Collectors.toMap(i -> labelStrings[i], i -> i));
+
+    List<INDArray> vectors = new ArrayList<INDArray>();
+
+    for (int i = 0; i < sample.getSentence().length; i++) {
+      // encode the outcome as one-hot-representation
+      String outcomes[] =
+          new BioCodec().encode(sample.getNames(), 
sample.getSentence().length);
+
+      INDArray labels = Nd4j.create(1, labelStrings.length, windowSize);
+      labels.putScalar(new int[]{0, labelToIndex.get(outcomes[i]), windowSize 
- 1}, 1.0d);
+      vectors.add(labels);
+    }
+
+    return vectors;
+  }
+
+  private static int max(INDArray array) {
+    int best = 0;
+    for (int i = 0; i < array.size(0); i++) {
+      if (array.getDouble(i) > array.getDouble(best)) {
+        best = i;
+      }
+    }
+    return  best;
+  }
+
+  @Override
+  public Span[] find(String[] tokens) {
+    List<INDArray> featureMartrices = mapToFeatureMatrices(wordVectors, 
tokens, windowSize);
+
+    String[] outcomes = new String[tokens.length];
+    for (int i = 0; i < tokens.length; i++) {
+      INDArray predictionMatrix = network.output(featureMartrices.get(i), 
false);
+      INDArray outcomeVector = predictionMatrix.get(NDArrayIndex.point(0), 
NDArrayIndex.all(),
+          NDArrayIndex.point(windowSize - 1));
+
+      outcomes[i] = labels[max(outcomeVector)];
+    }
+
+    // Delete invalid spans ...
+    for (int i = 0; i < outcomes.length; i++) {
+      if (outcomes[i].endsWith("cont") && (i == 0 || "other".equals(outcomes[i 
- 1]))) {
+        outcomes[i] = "other";
+      }
+    }
+
+    return new BioCodec().decode(Arrays.asList(outcomes));
+  }
+
+  @Override
+  public void clearAdaptiveData() {
+  }
+
+  public static MultiLayerNetwork train(WordVectors wordVectors, 
ObjectStream<NameSample> samples,
+                                        int epochs, int windowSize, String[] 
labels) throws IOException {
+    int vectorSize = 300;
+    int layerSize = 256;
+
+    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
+        
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
+        .updater(Updater.RMSPROP)
+        .regularization(true).l2(0.001)
+        .weightInit(WeightInit.XAVIER)
+        // 
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
+        .learningRate(0.01)
+        .list()
+        .layer(0, new GravesLSTM.Builder().nIn(vectorSize).nOut(layerSize)
+            .activation(Activation.TANH).build())
+        .layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX)
+            
.lossFunction(LossFunctions.LossFunction.MCXENT).nIn(layerSize).nOut(3).build())
+        .pretrain(false).backprop(true).build();
+
+    MultiLayerNetwork net = new MultiLayerNetwork(conf);
+    net.init();
+    net.setListeners(new ScoreIterationListener(5));
+
+    // TODO: Extract labels on the fly from the data
+
+    DataSetIterator train = new NameSampleDataSetIterator(samples, 
wordVectors, windowSize, labels);
+
+    System.out.println("Starting training");
+
+    for (int i = 0; i < epochs; i++) {
+      net.fit(train);
+      train.reset();
+      System.out.println(String.format("Finished epoch %d", i));
+    }
+
+    return net;
+  }
+
+  public static void main(String[] args) throws Exception {
+    if (args.length != 3) {
+      System.out.println("Usage: trainFile testFile gloveTxt");
+      return;
+    }
+
+    String[] labels = new String[] {
+        "default-start", "default-cont", "other"
+    };
+
+    System.out.print("Loading vectors ... ");
+    WordVectors wordVectors = WordVectorSerializer.loadTxtVectors(
+        new File(args[2]));
+    System.out.println("Done");
+
+    int windowSize = 5;
+
+    MultiLayerNetwork net = train(wordVectors, new NameSampleDataStream(new 
PlainTextByLineStream(
+        new MarkableFileInputStreamFactory(new File(args[0])), 
StandardCharsets.UTF_8)), 1, windowSize, labels);
+
+    ObjectStream<NameSample> evalStream = new NameSampleDataStream(new 
PlainTextByLineStream(
+        new MarkableFileInputStreamFactory(
+            new File(args[1])), StandardCharsets.UTF_8));
+
+    NameFinderDL nameFinder = new NameFinderDL(net, wordVectors, windowSize, 
labels);
+
+    System.out.print("Evaluating ... ");
+    TokenNameFinderEvaluator nameFinderEvaluator = new 
TokenNameFinderEvaluator(nameFinder);
+    nameFinderEvaluator.evaluate(evalStream);
+
+    System.out.println("Done");
+
+    System.out.println();
+    System.out.println();
+    System.out.println("Results");
+
+    System.out.println(nameFinderEvaluator.getFMeasure().toString());
+  }
+}

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/opennlp/tools/dl/NameSampleDataSetIterator.java
----------------------------------------------------------------------
diff --git 
a/opennlp-dl/src/main/java/opennlp/tools/dl/NameSampleDataSetIterator.java 
b/opennlp-dl/src/main/java/opennlp/tools/dl/NameSampleDataSetIterator.java
new file mode 100644
index 0000000..a420220
--- /dev/null
+++ b/opennlp-dl/src/main/java/opennlp/tools/dl/NameSampleDataSetIterator.java
@@ -0,0 +1,225 @@
+package opennlp.tools.dl;/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.NoSuchElementException;
+
+import org.deeplearning4j.models.embeddings.wordvectors.WordVectors;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.dataset.DataSet;
+import org.nd4j.linalg.dataset.api.DataSetPreProcessor;
+import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
+import org.nd4j.linalg.factory.Nd4j;
+import org.nd4j.linalg.indexing.INDArrayIndex;
+import org.nd4j.linalg.indexing.NDArrayIndex;
+
+import opennlp.tools.namefind.NameSample;
+import opennlp.tools.util.FilterObjectStream;
+import opennlp.tools.util.ObjectStream;
+
+public class NameSampleDataSetIterator implements DataSetIterator {
+
+  private static class NameSampleToDataSetStream extends 
FilterObjectStream<NameSample, DataSet> {
+
+    private final WordVectors wordVectors;
+    private final String[] labels;
+    private int windowSize;
+
+    private Iterator<DataSet> dataSets = Collections.emptyListIterator();
+
+    NameSampleToDataSetStream(ObjectStream<NameSample> samples, WordVectors 
wordVectors, int windowSize, String[] labels) {
+      super(samples);
+      this.wordVectors = wordVectors;
+      this.windowSize = windowSize;
+      this.labels = labels;
+    }
+
+    private Iterator<DataSet> createDataSets(NameSample sample) {
+      List<INDArray> features = NameFinderDL.mapToFeatureMatrices(wordVectors, 
sample.getSentence(),
+          windowSize);
+
+      List<INDArray> labels = NameFinderDL.mapToLabelVectors(sample, 
windowSize, this.labels);
+
+      List<DataSet> dataSetList = new ArrayList<>();
+
+      for (int i = 0; i < features.size(); i++) {
+        dataSetList.add(new DataSet(features.get(i), labels.get(i)));
+      }
+
+      return dataSetList.iterator();
+    }
+
+    @Override
+    public final DataSet read() throws IOException {
+
+      if (dataSets.hasNext()) {
+        return dataSets.next();
+      }
+      else {
+        NameSample sample;
+        while (!dataSets.hasNext() && (sample = samples.read()) != null) {
+          dataSets = createDataSets(sample);
+        }
+
+        if (dataSets.hasNext()) {
+          return read();
+        }
+      }
+
+      return null;
+    }
+  }
+
+  private final int windowSize;
+  private final String[] labels;
+
+  private final int batchSize = 128;
+  private final int vectorSize = 300;
+
+  private final int totalSamples;
+
+  private int cursor = 0;
+
+  private final ObjectStream<DataSet> samples;
+
+  NameSampleDataSetIterator(ObjectStream<NameSample> samples, WordVectors 
wordVectors, int windowSize,
+                            String labels[]) throws IOException {
+    this.windowSize = windowSize;
+    this.labels = labels;
+
+    this.samples = new NameSampleToDataSetStream(samples, wordVectors, 
windowSize, labels);
+
+    int total = 0;
+
+    DataSet sample;
+    while ((sample = this.samples.read()) != null) {
+      total++;
+    }
+
+    totalSamples = total;
+
+    samples.reset();
+  }
+
+  public DataSet next(int num) {
+    if (cursor >= totalExamples()) throw new NoSuchElementException();
+
+    INDArray features = Nd4j.create(num, vectorSize, windowSize);
+    INDArray featuresMask = Nd4j.zeros(num, windowSize);
+
+    INDArray labels = Nd4j.create(num, 3, windowSize);
+    INDArray labelsMask = Nd4j.zeros(num, windowSize);
+
+    // iterate stream and copy to arrays
+
+    for (int i = 0; i < num; i++) {
+      DataSet sample;
+      try {
+        sample = samples.read();
+      } catch (IOException e) {
+        throw new RuntimeException(e);
+      }
+
+      if (sample != null) {
+        INDArray feature = sample.getFeatureMatrix();
+        features.put(new INDArrayIndex[] {NDArrayIndex.point(i)}, 
feature.get(NDArrayIndex.point(0)));
+
+        feature.get(new INDArrayIndex[] {NDArrayIndex.point(0), 
NDArrayIndex.all(),
+            NDArrayIndex.point(0)});
+
+        for (int j = 0; j < windowSize; j++) {
+          featuresMask.putScalar(new int[] {i, j}, 1.0);
+        }
+
+        INDArray label = sample.getLabels();
+        labels.put(new INDArrayIndex[] {NDArrayIndex.point(i)}, 
label.get(NDArrayIndex.point(0)));
+        labelsMask.putScalar(new int[] {i, windowSize - 1}, 1.0);
+      }
+
+      cursor++;
+    }
+
+    return new DataSet(features, labels, featuresMask, labelsMask);
+  }
+
+  public int totalExamples() {
+    return totalSamples;
+  }
+
+  public int inputColumns() {
+    return vectorSize;
+  }
+
+  public int totalOutcomes() {
+    return getLabels().size();
+  }
+
+  public boolean resetSupported() {
+    return true;
+  }
+
+  public boolean asyncSupported() {
+    return false;
+  }
+
+  public void reset() {
+    cursor = 0;
+
+    try {
+      samples.reset();
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  public int batch() {
+    return batchSize;
+  }
+
+  public int cursor() {
+    return cursor;
+  }
+
+  public int numExamples() {
+    return totalExamples();
+  }
+
+  public void setPreProcessor(DataSetPreProcessor dataSetPreProcessor) {
+    throw new UnsupportedOperationException();
+  }
+
+  public DataSetPreProcessor getPreProcessor() {
+    throw new UnsupportedOperationException();
+  }
+
+  public List<String> getLabels() {
+    return Arrays.asList("start","cont", "other");
+  }
+
+  public boolean hasNext() {
+    return cursor < numExamples();
+  }
+
+  public DataSet next() {
+    return next(batchSize);
+  }
+}

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/opennlp/tools/dl/RNN.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/main/java/opennlp/tools/dl/RNN.java 
b/opennlp-dl/src/main/java/opennlp/tools/dl/RNN.java
new file mode 100644
index 0000000..155ec03
--- /dev/null
+++ b/opennlp-dl/src/main/java/opennlp/tools/dl/RNN.java
@@ -0,0 +1,366 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package opennlp.tools.dl;
+
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Date;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.commons.math3.distribution.EnumeratedDistribution;
+import org.apache.commons.math3.util.Pair;
+import org.nd4j.linalg.api.iter.NdIndexIterator;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.api.ops.impl.transforms.SetRange;
+import org.nd4j.linalg.api.ops.impl.transforms.SoftMax;
+import org.nd4j.linalg.factory.Nd4j;
+import org.nd4j.linalg.ops.transforms.Transforms;
+
+/**
+ * A min char/word-level vanilla RNN model, based on Andrej Karpathy's python 
code.
+ * See also:
+ *
+ * @see <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness";>The 
Unreasonable Effectiveness of Recurrent Neural Networks</a>
+ * @see <a 
href="https://gist.github.com/karpathy/d4dee566867f8291f086";>Minimal 
character-level language model with a Vanilla Recurrent Neural Network, in 
Python/numpy</a>
+ */
+public class RNN {
+
+  // hyperparameters
+  protected final float learningRate; // size of hidden layer of neurons
+  protected final int seqLength; // no. of steps to unroll the RNN for
+  protected final int hiddenLayerSize;
+  protected final int epochs;
+  protected final boolean useChars;
+  protected final int vocabSize;
+  protected final Map<String, Integer> charToIx;
+  protected final Map<Integer, String> ixToChar;
+  protected final List<String> data;
+  private final static double reg = 1e-8;
+
+  // model parameters
+  private final INDArray wxh; // input to hidden
+  private final INDArray whh; // hidden to hidden
+  private final INDArray why; // hidden to output
+  private final INDArray bh; // hidden bias
+  private final INDArray by; // output bias
+
+  private INDArray hPrev = null; // memory state
+
+  public RNN(float learningRate, int seqLength, int hiddenLayerSize, int 
epochs, String text) {
+    this(learningRate, seqLength, hiddenLayerSize, epochs, text, true);
+  }
+
+  public RNN(float learningRate, int seqLength, int hiddenLayerSize, int 
epochs, String text, boolean useChars) {
+    this.learningRate = learningRate;
+    this.seqLength = seqLength;
+    this.hiddenLayerSize = hiddenLayerSize;
+    this.epochs = epochs;
+    this.useChars = useChars;
+
+    String[] textTokens = useChars ? toStrings(text.toCharArray()) : 
text.split(" ");
+    data = new LinkedList<>();
+    Collections.addAll(data, textTokens);
+    Set<String> tokens = new HashSet<>(data);
+    vocabSize = tokens.size();
+
+    System.out.printf("data has %d tokens, %d unique.\n", data.size(), 
vocabSize);
+    charToIx = new HashMap<>();
+    ixToChar = new HashMap<>();
+    int i = 0;
+    for (String c : tokens) {
+      charToIx.put(c, i);
+      ixToChar.put(i, c);
+      i++;
+    }
+
+    wxh = Nd4j.randn(hiddenLayerSize, vocabSize).mul(0.01);
+    whh = Nd4j.randn(hiddenLayerSize, hiddenLayerSize).mul(0.01);
+    why = Nd4j.randn(vocabSize, hiddenLayerSize).mul(0.01);
+    bh = Nd4j.zeros(hiddenLayerSize, 1);
+    by = Nd4j.zeros(vocabSize, 1);
+  }
+
+  private String[] toStrings(char[] chars) {
+    String[] strings = new String[chars.length];
+    for (int i = 0; i < chars.length; i++) {
+      strings[i] = String.valueOf(chars[i]);
+    }
+    return strings;
+  }
+
+  public void learn() {
+
+    int currentEpoch = 0;
+
+    int n = 0;
+    int p = 0;
+
+    // memory variables for Adagrad
+    INDArray mWxh = Nd4j.zerosLike(wxh);
+    INDArray mWhh = Nd4j.zerosLike(whh);
+    INDArray mWhy = Nd4j.zerosLike(why);
+
+    INDArray mbh = Nd4j.zerosLike(bh);
+    INDArray mby = Nd4j.zerosLike(by);
+
+    // loss at iteration 0
+    double smoothLoss = -Math.log(1.0 / vocabSize) * seqLength;
+
+    while (true) {
+      // prepare inputs (we're sweeping from left to right in steps seqLength 
long)
+      if (p + seqLength + 1 >= data.size() || n == 0) {
+        hPrev = Nd4j.zeros(hiddenLayerSize, 1); // reset RNN memory
+        p = 0; // go from start of data
+        currentEpoch++;
+        if (currentEpoch == epochs) {
+          System.out.println("training finished: e:" + epochs + ", l: " + 
smoothLoss + ", h:(" + learningRate + ", " + seqLength + ", " + hiddenLayerSize 
+ ")");
+          break;
+        }
+      }
+
+      INDArray inputs = getSequence(p);
+      INDArray targets = getSequence(p + 1);
+
+      // sample from the model every now and then
+      if (n % 1000 == 0 && n > 0) {
+        String txt = sample(inputs.getInt(0));
+        System.out.printf("\n---\n %s \n----\n", txt);
+      }
+
+      INDArray dWxh = Nd4j.zerosLike(wxh);
+      INDArray dWhh = Nd4j.zerosLike(whh);
+      INDArray dWhy = Nd4j.zerosLike(why);
+
+      INDArray dbh = Nd4j.zerosLike(bh);
+      INDArray dby = Nd4j.zerosLike(by);
+
+      // forward seqLength characters through the net and fetch gradient
+      double loss = lossFun(inputs, targets, dWxh, dWhh, dWhy, dbh, dby);
+      smoothLoss = smoothLoss * 0.999 + loss * 0.001;
+      if (Double.isNaN(smoothLoss)) {
+        System.out.println("loss is NaN (over/underflow occured, try adjusting 
hyperparameters)");
+        break;
+      }
+      if (n % 100 == 0) {
+        System.out.printf("iter %d, loss: %f\n", n, smoothLoss); // print 
progress
+      }
+
+      // perform parameter update with Adagrad
+      mWxh.addi(dWxh.mul(dWxh));
+      wxh.subi((dWxh.mul(learningRate)).div(Transforms.sqrt(mWxh.add(reg))));
+
+      mWhh.addi(dWhh.mul(dWhh));
+      whh.subi(dWhh.mul(learningRate).div(Transforms.sqrt(mWhh.add(reg))));
+
+      mWhy.addi(dWhy.mul(dWhy));
+      why.subi(dWhy.mul(learningRate).div(Transforms.sqrt(mWhy.add(reg))));
+
+      mbh.addi(dbh.mul(dbh));
+      bh.subi(dbh.mul(learningRate).div(Transforms.sqrt(mbh.add(reg))));
+
+      mby.addi(dby.mul(dby));
+      by.subi(dby.mul(learningRate).div(Transforms.sqrt(mby.add(reg))));
+
+      p += seqLength; // move data pointer
+      n++; // iteration counter
+    }
+  }
+
+  protected INDArray getSequence(int p) {
+    INDArray inputs = Nd4j.create(seqLength);
+    int c = 0;
+    for (String ch : data.subList(p, p + seqLength)) {
+      Integer ix = charToIx.get(ch);
+      inputs.putScalar(c, ix);
+      c++;
+    }
+    return inputs;
+  }
+
+  /**
+   * inputs, targets are both list of integers
+   * hprev is Hx1 array of initial hidden state
+   * returns the modified loss, gradients on model parameters
+   */
+  private double lossFun(INDArray inputs, INDArray targets, INDArray dWxh, 
INDArray dWhh, INDArray dWhy, INDArray dbh,
+                         INDArray dby) {
+
+    INDArray xs = Nd4j.zeros(inputs.length(), vocabSize);
+    INDArray hs = null;
+    INDArray ys = null;
+    INDArray ps = null;
+
+    INDArray hs1 = Nd4j.create(hPrev.shape());
+    Nd4j.copy(hPrev, hs1);
+
+    double loss = 0;
+
+    // forward pass
+    for (int t = 0; t < inputs.length(); t++) {
+      int tIndex = inputs.getScalar(t).getInt(0);
+      xs.putScalar(t, tIndex, 1); // encode in 1-of-k representation
+      INDArray hsRow = t == 0 ? hs1 : hs.getRow(t - 1);
+      INDArray hst = 
Transforms.tanh(wxh.mmul(xs.getRow(t).transpose()).add(whh.mmul(hsRow)).add(bh));
 // hidden state
+      if (hs == null) {
+        hs = init(inputs.length(), hst.shape());
+      }
+      hs.putRow(t, hst);
+
+      INDArray yst = (why.mmul(hst)).add(by); // unnormalized log 
probabilities for next chars
+      if (ys == null) {
+        ys = init(inputs.length(), yst.shape());
+      }
+      ys.putRow(t, yst);
+      INDArray pst = Nd4j.getExecutioner().execAndReturn(new SoftMax(yst)); // 
probabilities for next chars
+      if (ps == null) {
+        ps = init(inputs.length(), pst.shape());
+      }
+      ps.putRow(t, pst);
+      loss += -Math.log(pst.getDouble(targets.getInt(t))); // softmax 
(cross-entropy loss)
+    }
+
+    // backward pass: compute gradients going backwards
+    INDArray dhNext = Nd4j.zerosLike(hs.getRow(0));
+    for (int t = inputs.length() - 1; t >= 0; t--) {
+      INDArray dy = ps.getRow(t);
+      dy.putRow(targets.getInt(t), dy.getRow(targets.getInt(t)).sub(1)); // 
backprop into y
+      INDArray hst = hs.getRow(t);
+      dWhy.addi(dy.mmul(hst.transpose())); // derivative of hy layer
+      dby.addi(dy);
+      INDArray dh = why.transpose().mmul(dy).add(dhNext); // backprop into h
+      INDArray dhraw = (Nd4j.ones(hst.shape()).sub(hst.mul(hst))).mul(dh); // 
backprop through tanh nonlinearity
+      dbh.addi(dhraw);
+      dWxh.addi(dhraw.mmul(xs.getRow(t)));
+      INDArray hsRow = t == 0 ? hs1 : hs.getRow(t - 1);
+      dWhh.addi(dhraw.mmul(hsRow.transpose()));
+      dhNext = whh.transpose().mmul(dhraw);
+    }
+
+    this.hPrev = hs.getRow(inputs.length() - 1);
+
+    return loss;
+  }
+
+  protected INDArray init(int t, int[] aShape) {
+    INDArray as;
+    int[] shape = new int[1 + aShape.length];
+    shape[0] = t;
+    System.arraycopy(aShape, 0, shape, 1, aShape.length);
+    as = Nd4j.create(shape);
+    return as;
+  }
+
+  /**
+   * sample a sequence of integers from the model, using current (hPrev) 
memory state, seedIx is seed letter for first time step
+   */
+  public String sample(int seedIx) {
+
+    INDArray x = Nd4j.zeros(vocabSize, 1);
+    x.putScalar(seedIx, 1);
+    int sampleSize = 2 * seqLength;
+    INDArray ixes = Nd4j.create(sampleSize);
+
+    INDArray h = hPrev.dup();
+
+    for (int t = 0; t < sampleSize; t++) {
+      h = Transforms.tanh(wxh.mmul(x).add(whh.mmul(h)).add(bh));
+      INDArray y = (why.mmul(h)).add(by);
+      INDArray pm = Nd4j.getExecutioner().execAndReturn(new 
SoftMax(y)).ravel();
+
+      List<Pair<Integer, Double>> d = new LinkedList<>();
+      for (int pi = 0; pi < vocabSize; pi++) {
+        d.add(new Pair<>(pi, pm.getDouble(0, pi)));
+      }
+      EnumeratedDistribution<Integer> distribution = new 
EnumeratedDistribution<>(d);
+
+      int ix = distribution.sample();
+
+      x = Nd4j.zeros(vocabSize, 1);
+      x.putScalar(ix, 1);
+      ixes.putScalar(t, ix);
+    }
+
+    return getSampleString(ixes);
+  }
+
+  protected String getSampleString(INDArray ixes) {
+    StringBuilder txt = new StringBuilder();
+
+    NdIndexIterator ndIndexIterator = new NdIndexIterator(ixes.shape());
+    while (ndIndexIterator.hasNext()) {
+      int[] next = ndIndexIterator.next();
+      if (!useChars && txt.length() > 0) {
+        txt.append(' ');
+      }
+      txt.append(ixToChar.get(ixes.getInt(next)));
+    }
+    return txt.toString();
+  }
+
+  public int getVocabSize() {
+    return vocabSize;
+  }
+
+  @Override
+  public String toString() {
+    return getClass().getName() + "{" +
+            "learningRate=" + learningRate +
+            ", seqLength=" + seqLength +
+            ", hiddenLayerSize=" + hiddenLayerSize +
+            ", epochs=" + epochs +
+            ", vocabSize=" + vocabSize +
+            ", useChars=" + useChars +
+            '}';
+  }
+
+
+  public String getHyperparamsString() {
+    return getClass().getName() + "{" +
+            "wxh=" + wxh +
+            ", whh=" + whh +
+            ", why=" + why +
+            ", bh=" + bh +
+            ", by=" + by +
+            '}';
+  }
+
+  public void serialize(String prefix) throws IOException {
+    BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(new 
File(prefix + new Date().toString() + ".txt")));
+    bufferedWriter.write("wxh");
+    bufferedWriter.write(wxh.toString());
+    bufferedWriter.write("whh");
+    bufferedWriter.write(whh.toString());
+    bufferedWriter.write("why");
+    bufferedWriter.write(why.toString());
+    bufferedWriter.write("bh");
+    bufferedWriter.write(bh.toString());
+    bufferedWriter.write("by");
+    bufferedWriter.write(by.toString());
+    bufferedWriter.flush();
+    bufferedWriter.close();
+  }
+}

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/main/java/opennlp/tools/dl/StackedRNN.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/main/java/opennlp/tools/dl/StackedRNN.java 
b/opennlp-dl/src/main/java/opennlp/tools/dl/StackedRNN.java
new file mode 100644
index 0000000..e7a49d7
--- /dev/null
+++ b/opennlp-dl/src/main/java/opennlp/tools/dl/StackedRNN.java
@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package opennlp.tools.dl;
+
+import org.apache.commons.math3.distribution.EnumeratedDistribution;
+import org.apache.commons.math3.util.Pair;
+import org.nd4j.linalg.api.ndarray.INDArray;
+import org.nd4j.linalg.api.ops.impl.transforms.SetRange;
+import org.nd4j.linalg.api.ops.impl.transforms.SoftMax;
+import org.nd4j.linalg.factory.Nd4j;
+import org.nd4j.linalg.ops.transforms.Transforms;
+
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.util.Date;
+import java.util.LinkedList;
+import java.util.List;
+
+/**
+ * A basic char/word-level stacked RNN model (2 hidden recurrent layers), 
based on Stacked RNN architecture from ICLR 2014's
+ * "How to Construct Deep Recurrent Neural Networks" by Razvan Pascanu, Caglar 
Gulcehre, Kyunghyun Cho and Yoshua Bengio
+ * and Andrej Karpathy's notes on RNNs.
+ * See also:
+ *
+ * @see <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness";>The 
Unreasonable Effectiveness of Recurrent Neural Networks</a>
+ * @see <a href="https://arxiv.org/abs/1312.6026";>How to Construct Deep 
Recurrent Neural Networks</a>
+ */
+public class StackedRNN extends RNN {
+
+  // model parameters
+  private final INDArray wxh; // input to hidden
+  private final INDArray whh; // hidden to hidden
+  private final INDArray whh2; // hidden to hidden2
+  private final INDArray wh2y; // hidden2 to output
+  private final INDArray wxh2;
+  private final INDArray bh; // hidden bias
+  private final INDArray bh2; // hidden2 bias
+  private final INDArray by; // output bias
+
+  private final double reg = 1e-8;
+
+  private INDArray hPrev = null; // memory state
+  private INDArray hPrev2 = null; // memory state
+
+  public StackedRNN(float learningRate, int seqLength, int hiddenLayerSize, 
int epochs, String text) {
+    this(learningRate, seqLength, hiddenLayerSize, epochs, text, true);
+  }
+
+  public StackedRNN(float learningRate, int seqLength, int hiddenLayerSize, 
int epochs, String text, boolean useChars) {
+    super(learningRate, seqLength, hiddenLayerSize, epochs, text, useChars);
+
+    wxh = Nd4j.randn(hiddenLayerSize, 
vocabSize).div(Math.sqrt(hiddenLayerSize));
+    whh = Nd4j.randn(hiddenLayerSize, 
hiddenLayerSize).div(Math.sqrt(hiddenLayerSize));
+    whh2 = Nd4j.randn(hiddenLayerSize, 
hiddenLayerSize).div(Math.sqrt(hiddenLayerSize));
+    wxh2 = Nd4j.randn(hiddenLayerSize, 
hiddenLayerSize).div(Math.sqrt(hiddenLayerSize));
+    wh2y = Nd4j.randn(vocabSize, hiddenLayerSize).div(Math.sqrt(vocabSize));
+    bh = Nd4j.zeros(hiddenLayerSize, 1);
+    bh2 = Nd4j.zeros(hiddenLayerSize, 1);
+    by = Nd4j.zeros(vocabSize, 1);
+  }
+
+  public void learn() {
+
+    int currentEpoch = -1;
+
+    int n = 0;
+    int p = 0;
+
+    // memory variables for Adagrad
+    INDArray mWxh = Nd4j.zerosLike(wxh);
+    INDArray mWxh2 = Nd4j.zerosLike(wxh2);
+    INDArray mWhh = Nd4j.zerosLike(whh);
+    INDArray mWhh2 = Nd4j.zerosLike(whh2);
+    INDArray mWh2y = Nd4j.zerosLike(wh2y);
+
+    INDArray mbh = Nd4j.zerosLike(bh);
+    INDArray mbh2 = Nd4j.zerosLike(bh2);
+    INDArray mby = Nd4j.zerosLike(by);
+
+    // loss at iteration 0
+    double smoothLoss = -Math.log(1.0 / vocabSize) * seqLength;
+
+    while (true) {
+      // prepare inputs (we're sweeping from left to right in steps seqLength 
long)
+      if (p + seqLength + 1 >= data.size() || n == 0) {
+        hPrev = Nd4j.zeros(hiddenLayerSize, 1); // reset RNN memory
+        hPrev2 = Nd4j.zeros(hiddenLayerSize, 1); // reset RNN memory
+        p = 0; // go from start of data
+        currentEpoch++;
+        if (currentEpoch == epochs) {
+          System.out.println("training finished: e:" + epochs + ", l: " + 
smoothLoss + ", h:(" + learningRate + ", " + seqLength + ", " + hiddenLayerSize 
+ ")");
+          break;
+        }
+      }
+
+      INDArray inputs = getSequence(p);
+      INDArray targets = getSequence(p + 1);
+
+      // sample from the model every now and then
+      if (n % 1000 == 0 && n > 0) {
+        String txt = sample(inputs.getInt(0));
+        System.out.printf("\n---\n %s \n----\n", txt);
+      }
+
+      INDArray dWxh = Nd4j.zerosLike(wxh);
+      INDArray dWxh2 = Nd4j.zerosLike(wxh2);
+      INDArray dWhh = Nd4j.zerosLike(whh);
+      INDArray dWhh2 = Nd4j.zerosLike(whh2);
+      INDArray dWh2y = Nd4j.zerosLike(wh2y);
+
+      INDArray dbh = Nd4j.zerosLike(bh);
+      INDArray dbh2 = Nd4j.zerosLike(bh);
+      INDArray dby = Nd4j.zerosLike(by);
+
+      // forward seqLength characters through the net and fetch gradient
+      double loss = lossFun(inputs, targets, dWxh, dWhh, dWxh2, dWhh2, dWh2y, 
dbh, dbh2, dby);
+      smoothLoss = smoothLoss * 0.999 + loss * 0.001;
+      if (Double.isNaN(smoothLoss) || Double.isInfinite(smoothLoss)) {
+        System.out.println("loss is " + smoothLoss + " (over/underflow 
occured, try adjusting hyperparameters)");
+        break;
+      }
+      if (n % 100 == 0) {
+        System.out.printf("iter %d, loss: %f\n", n, smoothLoss); // print 
progress
+      }
+
+      // perform parameter update with Adagrad
+      mWxh.addi(dWxh.mul(dWxh));
+      wxh.subi(dWxh.mul(learningRate).div(Transforms.sqrt(mWxh.add(reg))));
+
+      mWxh2.addi(dWxh2.mul(dWxh2));
+      wxh2.subi(dWxh2.mul(learningRate).div(Transforms.sqrt(mWxh2.add(reg))));
+
+      mWhh.addi(dWhh.mul(dWhh));
+      whh.subi(dWhh.mul(learningRate).div(Transforms.sqrt(mWhh.add(reg))));
+
+      mWhh2.addi(dWhh2.mul(dWhh2));
+      whh2.subi(dWhh2.mul(learningRate).div(Transforms.sqrt(mWhh2.add(reg))));
+
+      mbh2.addi(dbh2.mul(dbh2));
+      bh2.subi(dbh2.mul(learningRate).div(Transforms.sqrt(mbh2.add(reg))));
+
+      mWh2y.addi(dWh2y.mul(dWh2y));
+      wh2y.subi(dWh2y.mul(learningRate).div(Transforms.sqrt(mWh2y.add(reg))));
+
+      mbh.addi(dbh.mul(dbh));
+      bh.subi(dbh.mul(learningRate).div(Transforms.sqrt(mbh.add(reg))));
+
+      mby.addi(dby.mul(dby));
+      by.subi(dby.mul(learningRate).div(Transforms.sqrt(mby.add(reg))));
+
+      p += seqLength; // move data pointer
+      n++; // iteration counter
+    }
+  }
+
+  /**
+   * inputs, targets are both list of integers
+   * hprev is Hx1 array of initial hidden state
+   * returns the loss, gradients on model parameters and last hidden state
+   */
+  private double lossFun(INDArray inputs, INDArray targets, INDArray dWxh, 
INDArray dWhh,  INDArray dWxh2, INDArray dWhh2, INDArray dWh2y,
+                         INDArray dbh, INDArray dbh2, INDArray dby) {
+
+    INDArray xs = Nd4j.zeros(seqLength, vocabSize);
+    INDArray hs = null;
+    INDArray hs2 = null;
+    INDArray ys = null;
+    INDArray ps = null;
+
+    double loss = 0;
+
+    // forward pass
+    for (int t = 0; t < seqLength; t++) {
+      int tIndex = inputs.getScalar(t).getInt(0);
+      xs.putScalar(t, tIndex, 1); // encode in 1-of-k representation
+
+      INDArray xst = xs.getRow(t);
+
+      hPrev = 
Transforms.tanh((wxh.mmul(xst.transpose()).add(whh.mmul(hPrev)).add(bh))); // 
hidden state
+      if (hs == null) {
+        hs = init(seqLength, hPrev.shape());
+      }
+      hs.putRow(t, hPrev.dup());
+
+      hPrev2 = 
Transforms.tanh((wxh2.mmul(hPrev).add(whh2.mmul(hPrev2)).add(bh2))); // hidden 
state 2
+      if (hs2 == null) {
+        hs2 = init(seqLength, hPrev2.shape());
+      }
+      hs2.putRow(t, hPrev2.dup());
+
+      INDArray yst = wh2y.mmul(hPrev2).add(by); // unnormalized log 
probabilities for next chars
+      if (ys == null) {
+        ys = init(seqLength, yst.shape());
+      }
+      ys.putRow(t, yst);
+
+      INDArray pst = Nd4j.getExecutioner().execAndReturn(new SoftMax(yst)); // 
probabilities for next chars
+      if (ps == null) {
+        ps = init(seqLength, pst.shape());
+      }
+      ps.putRow(t, pst);
+
+      loss += -Math.log(pst.getDouble(targets.getInt(t))); // softmax 
(cross-entropy loss)
+    }
+
+    // backward pass: compute gradients going backwards
+    INDArray dhNext = Nd4j.zerosLike(hs.getRow(0));
+    INDArray dh2Next = Nd4j.zerosLike(hs2.getRow(0));
+    for (int t = seqLength - 1; t >= 0; t--) {
+      INDArray dy = ps.getRow(t);
+      dy.getRow(targets.getInt(t)).subi(1); // backprop into y
+
+      INDArray hs2t = hs2.getRow(t);
+      INDArray hs2tm1 = t == 0 ? hPrev2 : hs2.getRow(t - 1);
+
+      dWh2y.addi(dy.mmul(hs2t.transpose()));
+      dby.addi(dy);
+
+      INDArray dh2 = wh2y.transpose().mmul(dy).add(dh2Next); // backprop into 
h2
+      INDArray dhraw2 = 
(Nd4j.ones(hs2t.shape()).sub(hs2t.mul(hs2t))).mul(dh2); //  backprop through 
tanh nonlinearity
+      dbh2.addi(dhraw2);
+      INDArray hst = hs.getRow(t);
+      dWxh2.addi(dhraw2.mmul(hst.transpose()));
+      dWhh2.addi(dhraw2.mmul(hs2tm1.transpose()));
+      dh2Next = whh2.transpose().mmul(dhraw2);
+
+      INDArray dh = wxh2.transpose().mmul(dhraw2).add(dhNext); // backprop 
into h
+      INDArray dhraw = (Nd4j.ones(hst.shape()).sub(hst.mul(hst))).mul(dh); // 
backprop through tanh nonlinearity
+      dbh.addi(dhraw);
+      dWxh.addi(dhraw.mmul(xs.getRow(t)));
+      INDArray hsRow = t == 0 ? hPrev : hs.getRow(t - 1);
+      dWhh.addi(dhraw.mmul(hsRow.transpose()));
+      dhNext = whh.transpose().mmul(dhraw);
+
+    }
+
+    this.hPrev = hs.getRow(seqLength - 1);
+    this.hPrev2 = hs2.getRow(seqLength - 1);
+
+    return loss;
+  }
+
+  /**
+   * sample a sequence of integers from the model, using current (hPrev) 
memory state, seedIx is seed letter for first time step
+   */
+  @Override
+  public String sample(int seedIx) {
+
+    INDArray x = Nd4j.zeros(vocabSize, 1);
+    x.putScalar(seedIx, 1);
+    int sampleSize = seqLength * 2;
+    INDArray ixes = Nd4j.create(sampleSize);
+
+    INDArray h = hPrev.dup();
+    INDArray h2 = hPrev2.dup();
+
+    for (int t = 0; t < sampleSize; t++) {
+      h = Transforms.tanh((wxh.mmul(x)).add(whh.mmul(h)).add(bh));
+      h2 = Transforms.tanh((wxh2.mmul(h)).add(whh2.mmul(h2)).add(bh2));
+      INDArray y = wh2y.mmul(h2).add(by);
+      INDArray pm = Nd4j.getExecutioner().execAndReturn(new 
SoftMax(y)).ravel();
+
+      List<Pair<Integer, Double>> d = new LinkedList<>();
+      for (int pi = 0; pi < vocabSize; pi++) {
+        d.add(new Pair<>(pi, pm.getDouble(0, pi)));
+      }
+      try {
+        EnumeratedDistribution<Integer> distribution = new 
EnumeratedDistribution<>(d);
+
+        int ix = distribution.sample();
+
+        x = Nd4j.zeros(vocabSize, 1);
+        x.putScalar(ix, 1);
+        ixes.putScalar(t, ix);
+      } catch (Exception e) {
+      }
+    }
+
+    return getSampleString(ixes);
+  }
+
+  @Override
+  public void serialize(String prefix) throws IOException {
+    BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(new 
File(prefix + new Date().toString() + ".txt")));
+    bufferedWriter.write("wxh");
+    bufferedWriter.write(wxh.toString());
+    bufferedWriter.write("whh");
+    bufferedWriter.write(whh.toString());
+    bufferedWriter.write("wxh2");
+    bufferedWriter.write(wxh2.toString());
+    bufferedWriter.write("whh2");
+    bufferedWriter.write(whh2.toString());
+    bufferedWriter.write("wh2y");
+    bufferedWriter.write(wh2y.toString());
+    bufferedWriter.write("bh");
+    bufferedWriter.write(bh.toString());
+    bufferedWriter.write("bh2");
+    bufferedWriter.write(bh2.toString());
+    bufferedWriter.write("by");
+    bufferedWriter.write(by.toString());
+    bufferedWriter.flush();
+    bufferedWriter.close();
+  }
+
+}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/test/java/opennlp/tools/dl/RNNTest.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/test/java/opennlp/tools/dl/RNNTest.java 
b/opennlp-dl/src/test/java/opennlp/tools/dl/RNNTest.java
new file mode 100644
index 0000000..2808f4d
--- /dev/null
+++ b/opennlp-dl/src/test/java/opennlp/tools/dl/RNNTest.java
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package opennlp.tools.dl;
+
+import java.io.FileInputStream;
+import java.io.InputStream;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.commons.io.IOUtils;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+/**
+ * CV tests for {@link RNN}
+ */
+@RunWith(Parameterized.class)
+public class RNNTest {
+
+  private float learningRate;
+  private int seqLength;
+  private int hiddenLayerSize;
+  private Random r = new Random();
+  private String text;
+  private final int epochs = 20;
+  private List<String> words;
+
+  public RNNTest(float learningRate, int seqLength, int hiddenLayerSize) {
+    this.learningRate = learningRate;
+    this.seqLength = seqLength;
+    this.hiddenLayerSize = hiddenLayerSize;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+    InputStream stream = getClass().getResourceAsStream("/text/sentences.txt");
+    text = IOUtils.toString(stream);
+    words = Arrays.asList(text.split(" "));
+    stream.close();
+  }
+
+  @Parameterized.Parameters
+  public static Collection<Object[]> data() {
+    return Arrays.asList(new Object[][] {
+        {1e-1f, 25, 20},
+        {1e-1f, 25, 40},
+        {1e-1f, 25, 60},
+    });
+  }
+
+  @Test
+  public void testVanillaCharRNNLearn() throws Exception {
+    RNN rnn = new RNN(learningRate, seqLength, hiddenLayerSize, epochs, text);
+    evaluate(rnn, true);
+    rnn.serialize("target/crnn-weights-");
+  }
+
+  @Test
+  public void testVanillaWordRNNLearn() throws Exception {
+    RNN rnn = new RNN(learningRate, seqLength, hiddenLayerSize, epochs * 2, 
text, false);
+    evaluate(rnn, true);
+    rnn.serialize("target/wrnn-weights-");
+  }
+
+  private void evaluate(RNN rnn, boolean checkRatio) {
+    System.out.println(rnn);
+    rnn.learn();
+    double c = 0;
+    for (int i = 0; i < 2; i++) {
+      int seed = r.nextInt(rnn.getVocabSize());
+      String sample = rnn.sample(seed);
+      System.out.println(sample);
+      if (checkRatio && rnn.useChars) {
+        String[] sampleWords = sample.split(" ");
+        for (String sw : sampleWords) {
+          if (words.contains(sw)) {
+            c++;
+          }
+        }
+        if (c > 0) {
+          c /= sampleWords.length;
+        }
+      }
+    }
+    if (checkRatio && rnn.useChars) {
+      System.out.println("average correct word ratio: " + (c / 10d));
+    }
+  }
+
+}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/test/java/opennlp/tools/dl/StackedRNNTest.java
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/test/java/opennlp/tools/dl/StackedRNNTest.java 
b/opennlp-dl/src/test/java/opennlp/tools/dl/StackedRNNTest.java
new file mode 100644
index 0000000..ac0434c
--- /dev/null
+++ b/opennlp-dl/src/test/java/opennlp/tools/dl/StackedRNNTest.java
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package opennlp.tools.dl;
+
+import java.io.InputStream;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.commons.io.IOUtils;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+/**
+ * Tests for {@link StackedRNN}
+ */
+@RunWith(Parameterized.class)
+public class StackedRNNTest {
+
+  private float learningRate;
+  private int seqLength;
+  private int hiddenLayerSize;
+  private Random r = new Random();
+  private String text;
+  private final int epochs = 20;
+  private List<String> words;
+
+  public StackedRNNTest(float learningRate, int seqLength, int 
hiddenLayerSize) {
+    this.learningRate = learningRate;
+    this.seqLength = seqLength;
+    this.hiddenLayerSize = hiddenLayerSize;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+    InputStream stream = getClass().getResourceAsStream("/text/sentences.txt");
+    text = IOUtils.toString(stream);
+    words = Arrays.asList(text.split(" "));
+    stream.close();
+  }
+
+  @Parameterized.Parameters
+  public static Collection<Object[]> data() {
+    return Arrays.asList(new Object[][] {
+        {1e-1f, 25, 20},
+        {1e-1f, 25, 40},
+        {1e-1f, 25, 60},
+    });
+  }
+
+  @Test
+  public void testStackedCharRNNLearn() throws Exception {
+    RNN rnn = new StackedRNN(learningRate, seqLength, hiddenLayerSize, epochs, 
text);
+    evaluate(rnn, true);
+    rnn.serialize("target/scrnn-weights-");
+  }
+
+  @Test
+  public void testStackedWordRNNLearn() throws Exception {
+    RNN rnn = new StackedRNN(learningRate, seqLength, hiddenLayerSize, epochs, 
text, false);
+    evaluate(rnn, true);
+    rnn.serialize("target/swrnn-weights-");
+  }
+
+  private void evaluate(RNN rnn, boolean checkRatio) {
+    System.out.println(rnn);
+    rnn.learn();
+    double c = 0;
+    for (int i = 0; i < 2; i++) {
+      int seed = r.nextInt(rnn.getVocabSize());
+      String sample = rnn.sample(seed);
+      System.out.println(sample);
+      if (checkRatio && rnn.useChars) {
+        String[] sampleWords = sample.split(" ");
+        for (String sw : sampleWords) {
+          if (words.contains(sw)) {
+            c++;
+          }
+        }
+        if (c > 0) {
+          c /= sampleWords.length;
+        }
+      }
+    }
+    if (checkRatio && rnn.useChars) {
+      System.out.println("average correct word ratio: " + (c / 10d));
+    }
+  }
+
+}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/a63ec16c/opennlp-dl/src/test/resources/text/sentences.txt
----------------------------------------------------------------------
diff --git a/opennlp-dl/src/test/resources/text/sentences.txt 
b/opennlp-dl/src/test/resources/text/sentences.txt
new file mode 100644
index 0000000..3079de1
--- /dev/null
+++ b/opennlp-dl/src/test/resources/text/sentences.txt
@@ -0,0 +1,52 @@
+The word2vec software of Tomas Mikolov and colleagues has gained a lot of 
traction lately and provides state-of-the-art word embeddings .
+The learning models behind the software are described in two research papers .
+We found the description of the models in these papers to be somewhat cryptic 
and hard to follow .
+While the motivations and presentation may be obvious to the neural-networks 
language-modeling crowd we had to struggle quite a bit to figure out the 
rationale behind the equations .
+This note is an attempt to explain the negative sampling equation in 
Distributed Representations of Words and Phrases and their Compositionality by 
Tomas Mikolov Ilya Sutskever Kai Chen Greg Corrado and Jeffrey Dean .
+The departure point of the paper is the skip-gram model .
+In this model we are given a corpus of words w and their contexts c .
+We consider the conditional probabilities p(c|w) and given a corpus Text the 
goal is to set the parameters Î¸ of p(c|w;Î¸) so as to maximize the corpus 
probability .
+The recently introduced continuous Skip-gram model is an efficient method for 
learning high-quality distributed vector representations that capture a large 
number of precise syntactic and semantic word relationships .
+In this paper we present several extensions that improve both the quality of 
the vectors and the training speed .
+By subsampling of the frequent words we obtain significant speedup and also 
learn more regular word representations .
+We also describe a simple alternative to the hierarchical softmax called 
negative sampling .
+An inherent limitation of word representations is their indifference to word 
order and their inability to represent idiomatic phrases .
+For example the meanings of Canada and Air cannot be easily combined to obtain 
Air Canada .
+Motivated by this example we present a simple method for finding phrases in 
text and show that learning good vector representations for millions of phrases 
is possible .
+The similarity metrics used for nearest neighbor evaluations produce a single 
scalar that quantifies the relatedness of two words .
+This simplicity can be problematic since two given words almost always exhibit 
more intricate relationships than can be captured by a single number .
+For example man may be regarded as similar to woman in that both words 
describe human beings on the other hand the two words are often considered 
opposites since they highlight a primary axis along which humans differ from 
one another .
+In order to capture in a quantitative way the nuance necessary to distinguish 
man from woman it is necessary for a model to associate more than a single 
number to the word pair .
+A natural and simple candidate for an enlarged set of discriminative numbers 
is the vector difference between the two word vectors .
+GloVe is designed in order that such vector differences capture as much as 
possible the meaning specified by the juxtaposition of two words .
+Unsupervised word representations are very useful in NLP tasks both as inputs 
to learning algorithms and as extra word features in NLP systems .
+However most of these models are built with only local context and one 
representation per word .
+This is problematic because words are often polysemous and global context can 
also provide useful information for learning word meanings .
+We present a new neural network architecture which 1) learns word embeddings 
that better capture the semantics of words by incorporating both local and 
global document context and 2) accounts for homonymy and polysemy by learning 
multiple embeddings per word .
+We introduce a new dataset with human judgments on pairs of words in 
sentential context and evaluate our model on it showing that our model 
outperforms competitive baselines and other neural language models .
+Information Retrieval ( IR ) models need to deal with two difficult issues 
vocabulary mismatch and term dependencies .
+Vocabulary mismatch corresponds to the difficulty of retrieving relevant 
documents that do not contain exact query terms but semantically related terms .
+Term dependencies refers to the need of considering the relationship between 
the words of the query when estimating the relevance of a document .
+A multitude of solutions has been proposed to solve each of these two problems 
but no principled model solve both .
+In parallel in the last few years language models based on neural networks 
have been used to cope with complex natural language processing tasks like 
emotion and paraphrase detection .
+Although they present good abilities to cope with both term dependencies and 
vocabulary mismatch problems thanks to the distributed representation of words 
they are based upon such models could not be used readily in IR where the 
estimation of one language model per document ( or query ) is required .
+This is both computationally unfeasible and prone to over-fitting .
+Based on a recent work that proposed to learn a generic language model that 
can be modified through a set of document-specific parameters we explore use of 
new neural network models that are adapted to ad-hoc IR tasks .
+Within the language model IR framework we propose and study the use of a 
generic language model as well as a document-specific language model .
+Both can be used as a smoothing component but the latter is more adapted to 
the document at hand and has the potential of being used as a full document 
language model .
+We experiment with such models and analyze their results on TREC-1 to 8 
datasets .
+The word2vec model and application by Mikolov et al have attracted a great 
amount of attention in recent two years .
+The vector representations of words learned by word2vec models have been 
proven to be able to carry semantic meanings and are useful in various NLP 
tasks .
+As an increasing number of researchers would like to experiment with word2vec 
I notice that there lacks a material that comprehensively explains the 
parameter learning process of word2vec in details thus preventing many people 
with less neural network experience from understanding how exactly word2vec 
works .
+This note provides detailed derivations and explanations of the parameter 
update equations for the word2vec models including the original continuous 
bag-of-word ( CBOW ) and skip-gram models as well as advanced tricks 
hierarchical soft-max and negative sampling .
+In the appendix a review is given on the basics of neuron network models and 
backpropagation .
+To avoid the inaccuracy caused by classifying the example into several 
categories given by TREC manually we take the word2vec to represent all 
attractions and user contexts in the continuous vector space learnt by neural 
network language models .
+The base of NNML is using neural networks for the probability function .
+The model learns simultaneously a distributed representation for each word 
along with the probability function for word sequences expressed in terms of 
these representations .
+Training such large models we propose continuous bag of words as our framework 
and soft-max as the active function .
+So we use the word2vec to train wikitravel corpus and got the word vector .
+To avoid the curse of dimensionality by learning a distributed representation 
for words as our word vector we define a test set that compare different 
dimensionality of vectors for our task using the same training data and using 
the same model architecture .
+We extend the word2vec framework to capture meaning across languages .
+The input consists of a source text and a word-aligned parallel text in a 
second language .
+The joint word2vec tool then represents words in both languages within a 
common âsemanticâ vector space .
+The result can be used to enrich lexicons of under-resourced languages to 
identify ambiguities and to perform clustering and classification .
\ No newline at end of file

[1/2] opennlp-sandbox git commit: OPENNLP-1009 - added initial RNN and StackedRNN impls from Yay lab, minor fixes

Reply via email to