eric-haibin-lin commented on a change in pull request #9986: gluon language 
modeling dataset and text token reader

 File path: python/mxnet/gluon/data/text/
 @@ -0,0 +1,99 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# coding: utf-8
+# pylint: disable=
+"""Base classes for text datasets and readers."""
+__all__ = ['WordLanguageReader']
+import io
+import os
+from ..dataset import SimpleDataset
+from ..datareader import DataReader
+from .utils import flatten_samples, collate, pair
+class WordLanguageReader(DataReader):
+    """Text reader that reads a whole corpus and produces samples based on 
provided sample splitter
+    and word tokenizer.
+    Parameters
+    ----------
+    filename : str
+        Path to the input text file.
+    encoding : str, default 'utf8'
+        File encoding format.
+    sample_splitter : function, default str.splitlines
+        A function that splits the dataset string into samples.
+    tokenizer : function, default str.split
+        A function that splits each sample string into list of tokens.
+    seq_len : int or None
+        The length of each of the samples. If None, samples are divided 
according to
+        `sample_splitter` only, and may have variable lengths.
+    bos : str or None, default None
+        The token to add at the begining of each sentence. If None, nothing is 
+    eos : str or None, default None
+        The token to add at the end of each sentence. If None, nothing is 
+    pad : str or None, default None
+        The padding token to add at the end of dataset if `seq_len` is 
specified and the total
+        number of tokens in the corpus don't evenly divide `seq_len`. If pad 
is None or seq_len
 Review comment:
   "total number of tokens" -> does this include bos/eos? 

