[GitHub] eric-haibin-lin commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release

git Wed, 13 Sep 2017 09:54:09 -0700

eric-haibin-lin commented on issue #7319: [RoadMap] Legacy issue resolution
before 1.0 release
URL:
https://github.com/apache/incubator-mxnet/issues/7319#issuecomment-329230030

@formath you bring up a good point. Large indices is definitely a feature we
want to support in the long-term. We might want to open a separate issue and
discuss this.

First of all, we do plan to add sparse support for Embedding op, where the
weight can be in
[row_sparse](https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html)
format, and the gradient for the weight should be generated in row_sparse
format, too. I am currently working on code refactoring and documentations so
this sparse operator is not implemented yet.

Regarding large indices up to 64 bits, this requires the first task
@piiswrong brought up regarding int types in the C API, and the
`Kernel::Launch` API in the backend uses 32-bit int instead of 64-bit, which is
problematic for many operators which operate on ndarrays of large shape. So the
scope is bigger than just the embedding op and definitely, it takes some more
time to resolve.

Are you working on any industrial scale dataset? Two ways to circumvent the
64-bit hashed-index problem in my mind:
1. rehashing the indices into around 23 or 24 bit to reduce the
dimensionality, which doesn't hurt much as claimed by [some
paper](https://arxiv.org/pdf/0902.2206.pdf), and doesn't cause the operator to
break in MXNet.
2. preprocessing the dataset to find out the number of unique features and
map them to continuous indices instead.
@formath what's your thought on this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



With regards,
Apache Git Services

[GitHub] eric-haibin-lin commented on issue #7319: [RoadMap] Legacy issue resolution before 1.0 release

Reply via email to