Hi Sandeep! Thank you for looking into this. Below are the answers as I
have them now:

1) As of now, I do not have any metric to compare librosa with other
libraries currently available. I am working on this to find some. As far as
community usage is concerned, I have come across blogs which speak well of
librosa as an audio load/manipulation library. One of them is here
<https://towardsdatascience.com/urban-sound-classification-part-2-sample-rate-conversion-librosa-ba7bc88f209a>
. Librosa is
performing slow as I have consulted some other frameworks that use the
library in their use cases as well. I have used scipy.io.wavfile too but it
just supports audio load and not much useful feature extraction/audio
transform. Librosa load takes care of a lot of preprocessing like
resampling the audio to a standard sampling rate, convert stereo to mono,
scale the audio samples and so on. So, this library was chosen to start
with. But I also intend to have some feedback from the community if they
have some ideas about other libraries which do these tasks performing
better.

2) Your suggestion to remove the hard dependency on this library for the
users does make sense. It should be installed only when the users really
need to perform these audio related tasks and we rely on librosa at the
moment for that.

3) I have looked into the code for librosa, however it needs more
understanding, so unless that is figured out, it will be soon to comment
how the operators would be implemented or how they can be extended to
support other languages.

4) Yes, the time difference is huge! However, librosa load( loading the
audio onto numpy array) does take the bulk of the time (80-90%) and not the
feature extraction like (mfcc, mel etc.). That is the reason why we have
disabled *lazy = True* in current design by overriding this in the method.
So, initializing gluon's dataloader is taking time, training goes quicker
then. This certainly needs more analysis of ways to do this.
Or, if we find other library(better suited for this) altogether, it will
help too.

5) Yes, I would need comments/suggestions from Committers/Contributors on
this too.

Appreciate your comments.

Thanks and regards,
Gaurav

On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Thanks, Gaurav for starting this initiative. The design document is
> detailed and gives all the information.
> Starting to add this in "Contrib" is a good idea while we expect a few
> rough edges and cleanups to follow.
>
> I had the following queries:
> 1. Is there any analysis comparing LibROSA with other libraries? w.r.t
> features, performance, community usage in audio data domain.
> 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi or
> ask the user to install if required? I prefer the latter, similar to
> protobuf in ONNX-MXNet.
> 3. I see LibROSA is a fully Python-based library. Are we getting blocked on
> the dependency for future use cases when we want to make transformations as
> operators and allow for cross-language support?
> 4. In performance design considerations, with lazy=True / False the
> performance difference is too scary ( 8 minutes to 4 hours!!) This requires
> some more analysis. If we known turning a flag off/on has 24X performance
> degradation, should we need to provide that control to user? What is the
> impact of this on Memory usage?
> 5. I see LibROSA has ISC license (
> https://github.com/librosa/librosa/blob/master/LICENSE.md) which says free
> to use with same license notification. I am not sure if this is ok. I
> request other committers/mentors to suggest.
>
> Best,
> Sandeep
>
> On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh <gaurav.gire...@gmail.com>
> wrote:
>
> > Dear MXNet Community,
> >
> > I recently started looking into performing some simple sound multi-class
> > classification tasks with Audio Data and realized that as a user, I would
> > like MXNet to have an out of the box feature which allows us to load
> audio
> > data(at least 1 file format), extract features( or apply some common
> > transforms/feature extraction) and train a model using the Audio Dataset.
> > This could be a first step towards building and supporting APIs similar
> to
> > what we have for "vision" related use cases in MXNet.
> >
> > Below is the design proposal :
> >
> > Gluon - Audio Design Proposal
> > <https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio>
> >
> > I would highly appreciate your taking time to review and provide
> feedback,
> > comments/suggestions on this.
> > Looking forward to your support.
> >
> >
> > Best Regards,
> >
> > Gaurav Gireesh
> >
>
>
> --
> Sandeep Krishnamurthy
>

Reply via email to