Some feedback from MXNet Zhihu topic

Timur Shenkao Thu, 20 Sep 2018 09:51:14 -0700

There are:
Gluon API
Module API
Some other apis in mxnet
 low-level C / C++ apis


Recently I accidentally found that exist such things like Gluon NLP and
Gluon CV (besides some examples in the very MXNet).
It's unclear whether I can rely on some API or I have to create my own C /
C++ code.

I implement publicly available articles and some other ideas in TF all the
time. But when it comes to MXNet, I am often reluctant because it's
difficult to understand which way to go. It's unclear whether my efforts
will result in some working model or I will get stuck.
Points #5 and #6 are absolutely true.
As for documentation, all projects in their turbulent phase of lifecycle
have outdated docs, it's normal. I say docs are very good (I remember early
Spark & DL4J docs 😂 )



On Thursday, September 20, 2018, Tianqi Chen <[email protected]>
wrote:

> The key complain here is mainly about the clarity of the documents
> themselves. Maybe it is time to focus on a single flavor of API that is
> useful(Gluon) and highlight all the docs around that
>
> Tianqi
>
>
> On Wed, Sep 19, 2018 at 11:04 AM Qing Lan <[email protected]> wrote:
>
> > Hi all,
> >
> > There was a trend topic<https://www.zhihu.com/question/293996867> in
> > Zhihu (a famous Chinese Stackoverflow+Quora) asking about the status of
> > MXNet in 2018 recently. Mu replied the thread and obtained more than 300+
> > `like`.
> > However there are a few concerns addressed in the comments of this
> thread,
> > I have done some simple translation from Chinese to English:
> >
> > 1. Documentations! Until now, the online doc still contains:
> >                 1. Depreciated but not updated doc
> >                 2. Wrong documentation with poor description
> >                 3. Document in Alpha stage such as you must install `pip
> > –pre` in order to run.
> >
> > 2. Examples! For Gluon specifically, many examples are still mixing
> > Gluon/MXNet apis. The mixure of mx.sym, mx.nd mx.gluon confused the users
> > of what is the right one to choose in order to get their model to work.
> As
> > an example, Although Gluon made data encapsulation possible, still there
> > are examples using mxn.io.ImageRecordIter with tens of params (feels like
> > gluon examples are simply the copy from old Python examples).
> >
> > 3. Examples again! Comparing to PyTorch, there are a few examples I don't
> > like in Gluon:
> >                 1. Available to run however the code structure is still
> > very complicated. Such as example/image-classification/cifar10.py. It
> > seemed like a consecutive code concatenation. In fact, these are just a
> > series of layers mixed with model.fit. It makes user very hard to
> > modify/extend the model.
> >                 2. Only available to run with certain settings. If users
> > try to change a little bit in the model, crashes will happen. For
> example,
> > the multi-gpu example in Gluon website, MXNet hide the logic that using
> > batch size to change learning rate in a optimizer. A lot of newbies
> didn't
> > know this fact and they would only find that the model stopped converging
> > when batch size changed.
> >                 3. The worst scenario is the model itself just simply
> > didn't work. Maintainers in the MXNet community didn't run the model
> (even
> > no integration test) and merge the code directly. It makes the script not
> > able run till somebody raise the issues and fix it.
> >
> > 4. The Community problem. The core advantage for MXNet is it's
> scalability
> > and efficiency. However, the documentation of some tools are confusing.
> > Here are two examples:
> >
> >                 1. im2rec contains 2 versions, C++ (binary) and python.
> > But nobody would thought that the argparse in these tools are different
> (in
> > the meantime, there is no appropriate examples to compare with, users
> could
> > only use them by guessing the usage).
> >
> >                 2. How to combine MXNet distributed platform with
> > supercomputing tool such as Slurm? How do we do profiling and how to
> debug.
> > A couples of companies I knew thought of using MXNet for distributed
> > training. Due to lack of examples and poor support from the community,
> they
> > have to change their models into TensorFlow and Horovod.
> >
> > 5. The heavy code base. Most of the MXNet examples/source
> > code/documentation/language binding are in a single repo. A git clone
> > operation will cost tens of Mb. The New feature PR would takes longer
> time
> > than expected. The poor reviewing response / rules keeps new contributors
> > away from the community. I remember there was a call for
> > document-improvement last year. The total timeline cost a user 3 months
> of
> > time to merge into master. It almost equals to a release interval of
> > Pytorch.
> >
> > 6. To Developers. There are very few people in the community discussed
> the
> > improvement we can take to make MXNet more user-friendly. It's been so
> easy
> > to trigger tens of stack issues during coding. Again, is that a
> requirement
> > for MXNet users to be familiar with C++? The connection between Python
> and
> > C lacks a IDE lint (maybe MXNet assume every developers as a VIM master).
> > API/underlying implementation chaged frequently. People have to release
> > their code with an achieved version of MXNet (such as TuSimple and MSRA).
> > Let's take a look at PyTorch, an API used move tensor to device would
> raise
> > a thorough discussion.
> >
> > There will be more comments translated to English and I will keep this
> > thread updated…
> > Thanks,
> > Qing
> >
>

Some feedback from MXNet Zhihu topic

Reply via email to