[incubator-mxnet] branch master updated: More sparse related docs (#7911)

jxie Wed, 27 Sep 2017 14:27:38 -0700

This is an automated email from the ASF dual-hosted git repository.

jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git



The following commit(s) were added to refs/heads/master by this push:
     new 6452fe6  More sparse related docs (#7911)
6452fe6 is described below

commit 6452fe681e1102d4a1ac9b18b4e3edbd3ba05306
Author: Haibin Lin <[email protected]>
AuthorDate: Wed Sep 27 14:27:20 2017 -0700

    More sparse related docs (#7911)
    
    * more sparse related docs
    
    * fix lint
    
    * fix grammar
    
    * raise exception on unimplemented aux type for sparse
---
 docs/api/python/ndarray/ndarray.md           |   2 +
 docs/api/python/ndarray/sparse.md            | 108 ++++++++++++++++++++++++++-
 docs/tutorials/basic/ndarray.md              |   4 +-
 python/mxnet/ndarray/sparse.py               |  20 +++--
 python/mxnet/optimizer.py                    |   4 +-
 python/mxnet/test_utils.py                   |  53 ++++++++++---
 src/engine/threaded_engine.h                 |   2 +-
 src/io/iter_libsvm.cc                        |   5 +-
 tests/python/unittest/test_kvstore.py        |  32 ++++----
 tests/python/unittest/test_module.py         |   8 +-
 tests/python/unittest/test_sparse_ndarray.py |  10 +--
 11 files changed, 191 insertions(+), 57 deletions(-)

diff --git a/docs/api/python/ndarray/ndarray.md 
b/docs/api/python/ndarray/ndarray.md
index 315410a..064f070 100644
--- a/docs/api/python/ndarray/ndarray.md
+++ b/docs/api/python/ndarray/ndarray.md
@@ -461,6 +461,7 @@ The `ndarray` package provides several classes:
     sqrt
     rsqrt
     square
+    reciprocal
 ```
 
 ### Logic functions
@@ -476,6 +477,7 @@ The `ndarray` package provides several classes:
     lesser
     lesser_equal
 ```
+
 ### Random sampling
 
 ```eval_rst
diff --git a/docs/api/python/ndarray/sparse.md 
b/docs/api/python/ndarray/sparse.md
index ace9a0d..78f351b 100644
--- a/docs/api/python/ndarray/sparse.md
+++ b/docs/api/python/ndarray/sparse.md
@@ -91,7 +91,6 @@ We summarize the interface for each class in the following 
sections.
     :nosignatures:
 
     CSRNDArray.shape
-    CSRNDArray.size
     CSRNDArray.context
     CSRNDArray.dtype
     CSRNDArray.stype
@@ -153,7 +152,6 @@ We summarize the interface for each class in the following 
sections.
     :nosignatures:
 
     RowSparseNDArray.shape
-    RowSparseNDArray.size
     RowSparseNDArray.context
     RowSparseNDArray.dtype
     RowSparseNDArray.stype
@@ -185,6 +183,20 @@ We summarize the interface for each class in the following 
sections.
     RowSparseNDArray.zeros_like
 ```
 
+### Array rounding
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    RowSparseNDArray.round
+    RowSparseNDArray.rint
+    RowSparseNDArray.fix
+    RowSparseNDArray.floor
+    RowSparseNDArray.ceil
+    RowSparseNDArray.trunc
+```
+
 ### Indexing
 
 ```eval_rst
@@ -216,6 +228,8 @@ We summarize the interface for each class in the following 
sections.
     zeros_like
     csr_matrix
     row_sparse_array
+    mxnet.ndarray.load
+    mxnet.ndarray.save
 ```
 
 ## Array manipulation routines
@@ -248,10 +262,93 @@ We summarize the interface for each class in the 
following sections.
     :nosignatures:
 
     elemwise_add
+    elemwise_sub
+    elemwise_mul
+    negative
     dot
     add_n
 ```
 
+### Trigonometric functions
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    sin
+    tan
+    arcsin
+    arctan
+    degrees
+    radians
+```
+
+### Hyperbolic functions
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    sinh
+    tanh
+    arcsinh
+    arctanh
+```
+
+### Rounding
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    round
+    rint
+    fix
+    floor
+    ceil
+    trunc
+```
+
+### Exponents and logarithms
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    expm1
+    log1p
+```
+
+### Powers
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    sqrt
+    square
+```
+
+### Miscellaneous
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    abs
+    sign
+```
+
+### More
+
+```eval_rst
+.. autosummary::
+    :nosignatures:
+
+    make_loss
+    stop_gradient
+```
+
 ## API Reference
 
 <script type="text/javascript" 
src='../../../_static/js/auto_module_index.js'></script>
@@ -259,10 +356,10 @@ We summarize the interface for each class in the 
following sections.
 ```eval_rst
 
 .. autoclass:: mxnet.ndarray.sparse.CSRNDArray
-    :members: shape, size, context, dtype, stype, data, indices, indptr, copy, 
copyto, as_in_context, asnumpy, asscalar, astype, tostype, slice, wait_to_read, 
zeros_like, __getitem__, __setitem__
+    :members: shape, context, dtype, stype, data, indices, indptr, copy, 
copyto, as_in_context, asnumpy, asscalar, astype, tostype, slice, wait_to_read, 
zeros_like, __getitem__, __setitem__
 
 .. autoclass:: mxnet.ndarray.sparse.RowSparseNDArray
-    :members: shape, size, context, dtype, stype, data, indices, copy, copyto, 
as_in_context, asnumpy, asscalar, astype, tostype, wait_to_read, zeros_like, 
__getitem__, __setitem__
+    :members: shape, context, dtype, stype, data, indices, copy, copyto, 
as_in_context, asnumpy, asscalar, astype, tostype, wait_to_read, zeros_like, 
round, rint, fix, floor, ceil, trunc, __getitem__, __setitem__
 
 .. automodule:: mxnet.ndarray.sparse
     :members:
@@ -272,6 +369,9 @@ We summarize the interface for each class in the following 
sections.
 .. automodule:: mxnet.ndarray.sparse
     :members: array, zeros, empty
 
+.. automodule:: mxnet.ndarray
+    :members: load, save
+
 ```
 
 <script>auto_index("api-reference");</script>
diff --git a/docs/tutorials/basic/ndarray.md b/docs/tutorials/basic/ndarray.md
index bd76702..bc5ce89 100644
--- a/docs/tutorials/basic/ndarray.md
+++ b/docs/tutorials/basic/ndarray.md
@@ -33,7 +33,7 @@ Each NDArray supports some important attributes that you'll 
often want to query:
   and `m` columns, its `shape` will be `(n, m)`.
 - **ndarray.dtype**: A `numpy` _type_ object describing the type of its
   elements.
-- **ndarray.size**: the total number of components in the array - equal to the
+- **ndarray.size**: The total number of components in the array - equal to the
   product of the components of its `shape`
 - **ndarray.context**: The device on which this array is stored, e.g. `cpu()` 
or
   `gpu(1)`.
@@ -81,7 +81,7 @@ We can specify the element type with the option `dtype`, 
which accepts a numpy
 type. By default, `float32` is used:
 
 ```python
-# float32 is used in default
+# float32 is used by default
 a = mx.nd.array([1,2,3])
 # create an int32 array
 b = mx.nd.array([1,2,3], dtype=np.int32)
diff --git a/python/mxnet/ndarray/sparse.py b/python/mxnet/ndarray/sparse.py
index 9d9dd28..7995da5 100644
--- a/python/mxnet/ndarray/sparse.py
+++ b/python/mxnet/ndarray/sparse.py
@@ -88,6 +88,9 @@ def _new_alloc_handle(stype, shape, ctx, delay_alloc, dtype, 
aux_types, aux_shap
         A new empty ndarray handle
     """
     hdl = NDArrayHandle()
+    for aux_t in aux_types:
+        if np.dtype(aux_t) != np.dtype("int64"):
+            raise NotImplementedError("only int64 is supported for aux types")
     aux_type_ids = [int(_DTYPE_NP_TO_MX[np.dtype(aux_t).type]) for aux_t in 
aux_types]
     aux_shapes = [(0,) for aux_t in aux_types] if aux_shapes is None else 
aux_shapes
     aux_shape_lens = [len(aux_shape) for aux_shape in aux_shapes]
@@ -149,6 +152,11 @@ class BaseSparseNDArray(NDArray):
     def reshape(self, shape):
         raise NotSupportedForSparseNDArray(self.reshape, None, shape)
 
+    @property
+    def size(self):
+        # the `size` for a sparse ndarray is ambiguous, hence disabled.
+        raise NotImplementedError()
+
     def _aux_type(self, i):
         """Data-type of the array's ith aux data.
 
@@ -250,12 +258,12 @@ class BaseSparseNDArray(NDArray):
 
 # pylint: disable=abstract-method
 class CSRNDArray(BaseSparseNDArray):
-    """A sparse representation of 2D NDArray in the standard CSR format.
+    """A sparse representation of 2D NDArray in the Compressed Sparse Row 
format.
 
     A CSRNDArray represents an NDArray as three separate arrays: `data`,
     `indptr` and `indices`. It uses the standard CSR representation where the 
column indices for
-    row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding 
values are stored
-    in values[indptr[i]:indptr[i+1]].
+    row i are stored in ``indices[indptr[i]:indptr[i+1]]`` and their 
corresponding values are stored
+    in ``data[indptr[i]:indptr[i+1]]``.
 
     The column indices for a given row are expected to be sorted in ascending 
order.
     Duplicate column entries for the same row are not allowed.
@@ -492,7 +500,7 @@ class RowSparseNDArray(BaseSparseNDArray):
     `indices`.
 
     - data: an NDArray of any dtype with shape [D0, D1, ..., Dn].
-    - indices: a 1-D int64 NDArray with shape [D0].
+    - indices: a 1-D int64 NDArray with shape [D0] with values sorted in 
ascending order.
 
     The `indices` stores the indices of the row slices with non-zeros,
     while the values are stored in `data`. The corresponding NDArray ``dense``
@@ -513,11 +521,9 @@ class RowSparseNDArray(BaseSparseNDArray):
         array([[ 1.,  2., 3.],
                [ 4.,  0., 5.]], dtype=float32)
 
-    A RowSparseNDArray is typically used to represent non-zero row-slices of a 
large NDArray
+    A RowSparseNDArray is typically used to represent non-zero row slices of a 
large NDArray
     of shape [LARGE0, D1, .. , Dn] where LARGE0 >> D0 and most row slices are 
zeros.
 
-    The indices are expected to be sorted in ascending order.
-
     RowSparseNDArray is used principally in the definition of gradients for 
operations
     that have sparse gradients (e.g. sparse dot and sparse embedding).
     """
diff --git a/python/mxnet/optimizer.py b/python/mxnet/optimizer.py
index ec0a1d4..9f89415 100644
--- a/python/mxnet/optimizer.py
+++ b/python/mxnet/optimizer.py
@@ -395,8 +395,8 @@ class SGD(Optimizer):
     multi_precision: bool, optional
        Flag to control the internal precision of the optimizer.
        ``False`` results in using the same precision as the weights (default),
-       ``True`` makes internal 32-bit copy of the weights and applies gradients
-                in 32-bit precision even if actual weights used in the model 
have lower precision.
+       ``True`` makes internal 32-bit copy of the weights and applies 
gradients \
+                in 32-bit precision even if actual weights used in the model 
have lower precision.\
                 Turning this on can improve convergence and accuracy when 
training with float16.
     """
     def __init__(self, momentum=0.0, multi_precision=False, **kwargs):
diff --git a/python/mxnet/test_utils.py b/python/mxnet/test_utils.py
index f041118..bc92257 100644
--- a/python/mxnet/test_utils.py
+++ b/python/mxnet/test_utils.py
@@ -135,22 +135,32 @@ def _get_uniform_dataset_csr(num_rows, num_cols, 
density=0.1, dtype=None,
     """Returns CSRNDArray with uniform distribution
     This generates a csr matrix with totalnnz unique randomly chosen numbers
     from num_rows*num_cols and arranges them in the 2d array in the
-    following way: row_index = (random_number_generated / num_rows)
+    following way:
+    row_index = (random_number_generated / num_rows)
     col_index = random_number_generated - row_index * num_cols
     """
     _validate_csr_generation_inputs(num_rows, num_cols, density,
                                     distribution="uniform")
-    from scipy import sparse as spsp
-    csr = spsp.rand(num_rows, num_cols, density, dtype=dtype, format="csr")
-    if data_init is not None:
-        csr.data.fill(data_init)
-    if shuffle_csr_indices is True:
-        shuffle_csr_column_indices(csr)
-    result = mx.nd.sparse.csr_matrix(csr.data, csr.indptr, csr.indices,
-                                     (num_rows, num_cols), dtype=dtype)
+    try:
+        from scipy import sparse as spsp
+        csr = spsp.rand(num_rows, num_cols, density, dtype=dtype, format="csr")
+        if data_init is not None:
+            csr.data.fill(data_init)
+        if shuffle_csr_indices is True:
+            shuffle_csr_column_indices(csr)
+        result = mx.nd.sparse.csr_matrix(csr.data, csr.indptr, csr.indices,
+                                         (num_rows, num_cols), dtype=dtype)
+    except ImportError:
+        assert(data_init is None), \
+               "data_init option is not supported when scipy is absent"
+        assert(not shuffle_csr_indices), \
+               "shuffle_csr_indices option is not supported when scipy is 
absent"
+        # scipy not available. try to generate one from a dense array
+        dns = mx.nd.random.uniform(shape=(num_rows, num_cols), dtype=dtype)
+        masked_dns = dns * (dns < density)
+        result = masked_dns.tostype('csr')
     return result
 
-
 def _get_powerlaw_dataset_csr(num_rows, num_cols, density=0.1, dtype=None):
     """Returns CSRNDArray with powerlaw distribution
     with exponentially increasing number of non zeros in each row.
@@ -246,6 +256,7 @@ def rand_sparse_ndarray(shape, stype, density=None, 
dtype=None, distribution=Non
                         data_init=None, rsp_indices=None, modifier_func=None,
                         shuffle_csr_indices=False):
     """Generate a random sparse ndarray. Returns the ndarray, value(np) and 
indices(np)
+
     Parameters
     ----------
     shape: list or tuple
@@ -253,9 +264,11 @@ def rand_sparse_ndarray(shape, stype, density=None, 
dtype=None, distribution=Non
     density, optional: float, should be between 0 and 1
     distribution, optional: str, valid values: "uniform" or "powerlaw"
     dtype, optional: numpy.dtype, default value is None
+
     Returns
     -------
     Result of type CSRNDArray or RowSparseNDArray
+
     Examples
     --------
     Below is an example of the powerlaw distribution with csr as the stype.
@@ -265,7 +278,18 @@ def rand_sparse_ndarray(shape, stype, density=None, 
dtype=None, distribution=Non
     else, remaining unused_nnzs will be used in n+1th row
     If number of cols is too small and we have already reached column size it 
will fill up
     all following columns in all followings rows until we reach the required 
density.
-    density = rnd.rand() if density is None else density
+
+    >>> csr_arr, _ = rand_sparse_ndarray(shape=(5, 16), stype="csr",
+                                         density=0.50, distribution="powerlaw")
+    >>> indptr = csr_arr.indptr.asnumpy()
+    >>> indices = csr_arr.indices.asnumpy()
+    >>> data = csr_arr.data.asnumpy()
+    >>> row2nnz = len(data[indptr[1]:indptr[2]])
+    >>> row3nnz = len(data[indptr[2]:indptr[3]])
+    >>> assert(row3nnz == 2*row2nnz)
+    >>> row4nnz = len(data[indptr[3]:indptr[4]])
+    >>> assert(row4nnz == 2*row3nnz)
+
     """
     density = rnd.rand() if density is None else density
     dtype = default_dtype() if dtype is None else dtype
@@ -516,6 +540,13 @@ def assert_almost_equal_ignore_nan(a, b, rtol=None, 
atol=None, names=('a', 'b'))
 
     assert_almost_equal(a, b, rtol, atol, names)
 
+def assert_exception(f, exception_type, *args, **kwargs):
+    """Test that function f will throw an exception of type given by 
`exception_type`"""
+    try:
+        f(*args, **kwargs)
+        assert(False)
+    except exception_type:
+        return
 
 def retry(n):
     """Retry n times before failing for stochastic test cases."""
diff --git a/src/engine/threaded_engine.h b/src/engine/threaded_engine.h
index 9b7b74d..fef3346 100644
--- a/src/engine/threaded_engine.h
+++ b/src/engine/threaded_engine.h
@@ -345,7 +345,7 @@ class ThreadedEngine : public Engine {
         if (what.find("driver shutting down") == std::string::npos &&
             !shutdown_phase_) {
           LOG(FATAL) << e.what() << "\n" <<
-            "An fatal error occurred in asynchronous engine operation. "
+            "A fatal error occurred in asynchronous engine operation. "
             "If you do not know what caused this error, "
             "you can try set environment variable MXNET_ENGINE_TYPE "
             "to NaiveEngine and run with debugger (i.e. gdb). "
diff --git a/src/io/iter_libsvm.cc b/src/io/iter_libsvm.cc
index 8e53e6f..ab6cacb 100644
--- a/src/io/iter_libsvm.cc
+++ b/src/io/iter_libsvm.cc
@@ -201,8 +201,9 @@ MXNET_REGISTER_IO_ITER(LibSVMIter)
 .describe(R"code(Returns the libsvm file iterator which returns sparse data 
with `csr`
 storage type. This iterator is experimental and should be used with care.
 
-The input data is stored in a format similar to libsvm file format, except 
that the indices
-are expected to be zero-based instead of one-based. Details of the libsvm 
format are available
+The input data is stored in a format similar to libsvm file format, except 
that the **indices
+are expected to be zero-based instead of one-based, and the column indices for 
each row are
+expected to be sorted in ascending order**. Details of the libsvm format are 
available
 at `https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/`
 
 In this function, the `data_shape` parameter is used to set the shape of each 
line of the data.
diff --git a/tests/python/unittest/test_kvstore.py 
b/tests/python/unittest/test_kvstore.py
index 12feb7e..6ed037f 100644
--- a/tests/python/unittest/test_kvstore.py
+++ b/tests/python/unittest/test_kvstore.py
@@ -18,20 +18,13 @@
 # pylint: skip-file
 import mxnet as mx
 import numpy as np
-from mxnet.test_utils import rand_ndarray, assert_almost_equal
-from mxnet.base import py_str
+from mxnet.test_utils import rand_ndarray, assert_almost_equal, 
assert_exception
+from mxnet.base import py_str, MXNetError
 
 shape = (4, 4)
 keys = [5, 7, 11]
 str_keys = ['b', 'c', 'd']
 
-def assert_exception(f, *args, **kwargs):
-    try:
-        f(*args, **kwargs)
-        assert(False)
-    except:
-        return
-
 def init_kv(stype='default'):
     """init kv """
     kv = mx.kv.create()
@@ -258,29 +251,30 @@ def test_invalid_pull():
 
     def check_invalid_rsp_pull_single(kv, key):
         dns_val = mx.nd.ones(shape) * 2
-        assert_exception(kv.row_sparse_pull, key, out=dns_val, 
row_ids=mx.nd.array([1]))
+        assert_exception(kv.row_sparse_pull, MXNetError,
+                         key, out=dns_val, row_ids=mx.nd.array([1]))
 
     def check_invalid_rsp_pull_list(kv, key):
         dns_val = [mx.nd.ones(shape) * 2] * len(key)
-        assert_exception(kv.row_sparse_pull, key, out=dns_val,
+        assert_exception(kv.row_sparse_pull, MXNetError, key, out=dns_val,
                          row_ids=[mx.nd.array([1])] * len(key))
 
     def check_invalid_key_types_single(kv, key):
         dns_val = mx.nd.ones(shape) * 2
         rsp_val = dns_val.tostype('row_sparse')
-        assert_exception(kv.init, key, dns_val)
-        assert_exception(kv.push, key, dns_val)
-        assert_exception(kv.pull, key, dns_val)
-        assert_exception(kv.row_sparse_pull, key, rsp_val,
+        assert_exception(kv.init, MXNetError, key, dns_val)
+        assert_exception(kv.push, MXNetError, key, dns_val)
+        assert_exception(kv.pull, MXNetError, key, dns_val)
+        assert_exception(kv.row_sparse_pull, MXNetError, key, rsp_val,
                          row_ids=mx.nd.array([1]))
 
     def check_invalid_key_types_list(kv, key):
         dns_val = [mx.nd.ones(shape) * 2] * len(key)
         rsp_val = [val.tostype('row_sparse') for val in dns_val]
-        assert_exception(kv.init, key, dns_val)
-        assert_exception(kv.push, key, dns_val)
-        assert_exception(kv.pull, key, dns_val)
-        assert_exception(kv.row_sparse_pull, key, rsp_val,
+        assert_exception(kv.init, MXNetError, key, dns_val)
+        assert_exception(kv.push, MXNetError, key, dns_val)
+        assert_exception(kv.pull, MXNetError, key, dns_val)
+        assert_exception(kv.row_sparse_pull, MXNetError, key, rsp_val,
                          row_ids=[mx.nd.array([1])] * len(key))
 
     int_kv = init_kv()
diff --git a/tests/python/unittest/test_module.py 
b/tests/python/unittest/test_module.py
index 6813c48..542217f 100644
--- a/tests/python/unittest/test_module.py
+++ b/tests/python/unittest/test_module.py
@@ -512,14 +512,14 @@ def test_factorization_machine_module():
     mod = mx.mod.Module(symbol=model, data_names=['data'], 
label_names=['label'])
     # allocate memory by given the input data and lable shapes
     mod.bind(data_shapes=train_iter.provide_data, 
label_shapes=train_iter.provide_label)
-    # initialize parameters by uniform random numbers
+    # initialize parameters by random numbers
     mod.init_params(initializer=init)
-    # use Sparse SGD with learning rate 0.1 to train
+    # use sparse Adam with learning rate 0.1 to train
     adam = mx.optimizer.Adam(clip_gradient=5.0, learning_rate=0.001, 
rescale_grad=1.0/batch_size)
     mod.init_optimizer(optimizer=adam)
-    # use accuracy as the metric
+    # use MSE as the metric
     metric = mx.metric.create('MSE')
-    # train 10 epoch
+    # train 10 epochs
     for epoch in range(10):
         train_iter.reset()
         metric.reset()
diff --git a/tests/python/unittest/test_sparse_ndarray.py 
b/tests/python/unittest/test_sparse_ndarray.py
index 94ea228..52a1b3c 100644
--- a/tests/python/unittest/test_sparse_ndarray.py
+++ b/tests/python/unittest/test_sparse_ndarray.py
@@ -551,11 +551,11 @@ def test_synthetic_dataset_generator():
 def test_sparse_nd_exception():
     """ test invalid sparse operator will throw a exception """
     a = mx.nd.zeros((2,2))
-    try:
-        b = mx.nd.sparse.retain(a, invalid_arg="garbage_value")
-        assert(False)
-    except:
-        return
+    assert_exception(mx.nd.sparse.retain, mx.base.MXNetError,
+                     a, invalid_arg="garbage_value")
+    assert_exception(mx.nd.sparse.zeros, NotImplementedError,
+                     'csr', (2,2), aux_types=[np.int32, np.int32])
+
 
 if __name__ == '__main__':
     import nose

-- 
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].

[incubator-mxnet] branch master updated: More sparse related docs (#7911)

Reply via email to