gaurav-gireesh opened a new pull request #13241: [MXNET-1210 ][WIP] Gluon Audio
URL: https://github.com/apache/incubator-mxnet/pull/13241
 
 
   ## Description ##
   Phase 1 
   
   As a user, I would like to have an out of the box feature of audio data 
loader and some popular audio transforms in MXNet, that would allow me : 
   - to be able to load audio (only .wav files supported currently) files and 
make a Gluon AudioDataset (NDArrays),
   - apply some popular audio transforms on the audio data( example scaling, 
MEL, MFCC etc.),
   - load the Dataset using Gluon's DataLoader, train a neural network ( Ex: 
MLP) with this transformed audio dataset,
   - perform a simple audio data related task such as sounds classification - 1 
audio clip with 1 label( Multiclass sound classification problem).
   - have an end to end example for a sample Audio multi class classification 
task (Urban Sounds Classification)
   
   **Note**: The plan is to have a working piece of this model with an example 
into the contrib package of MXNet before it is agreed upon to move this 
implementation to the gluon.data module.
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to 
the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) 
created (except PRs with tiny changes)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator) - WIP
   - [ ] Code is well-documented: WIP
   - For user-facing API changes, API doc string has been updated. 
   - [x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [x] AudioFolderDataset(gluon.contrib.data.audio.datasets), 
   - [x] audio transforms (gluon.contrib.data.audio.transforms)
   
   ## Comments ##
   - This PR is a WIP. I will be adding tests for the modules added along with 
the documentation and an example(Notebook or a python script)
   - This feature allows to take first step in audio data related tasks and 
needs to be extended for more generic as well as advanced use cases - (Reason 
for contributing in the gluon.contrib package)
   - Opened this PR for early feedback regarding:
      - **librosa** - which is a dependency for this for audio load and feature 
extraction like mfcc, scale, mel etc.. This may have to be listed as a 
dependency in the CI script, or the tests will fail. Suggestions about where to 
include this or a work around is appreciated.
      - Design 
[here](https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio)- 
Feedback is required - Dev list discussion 
[here](https://lists.apache.org/thread.html/e2568b8b492fbeeafbe73a8abe9ca814e66d288977635c7a62cfa121@%3Cdev.mxnet.apache.org%3E)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to