Hi All, A lot of progress was made since we met last and here is the summary. First, WebSpeechAPI integration is completed in a test build and we prepared a couple of demos. This implementation uses Acoustics/language models & decoder from PocketSphinx (all open source):
Desktop demo: (Firefox nightly on Mac): http://youtu.be/UcBvsU0fCPs B2g (Flame) demo: https://www.youtube.com/watch?v=0zqBbDmQlQ4 Completed Items 1. Coding the integration of pocketsphinx API with Web Speech API layer at Gecko 2. Modify gUM C++ layer to return pcm as 8khz 3. Test the api with the speech decoder 3.1 Adjust pocketsphinx parameters to enhance accuracy 3.2 Define which languages we'll support initially -> Focused on English at this time 4. Include pocketsphinx sources on gecko and write the moz.build's for each library to be multi-platform and compiled with ./mach 5. Integrate the gecko-dev with b2g and compile them together to support FxOS (OK) 6. Test build Images ready for Mac and Flame (b2g) -- (Please send a note to [email protected] ) Next Steps 1. Fix minor adjusts on API implementation, code reviews 2. Write mochitests (discussing with QA/Jonathan) 3. Write the prototype (grammar based) app integrated with Gaia. (codenamed "Vaani") 4. Create remaining desktop images for Windows (and Android) 5. Plan integration into the baselines (Gecko, b2g) Thanks, Sandip Kamat & Andre Natal ----- Original Message ----- > From: "Sandip Kamat" <[email protected]> > To: [email protected], [email protected], > "[email protected]" <[email protected]> > Cc: "André Natal" <[email protected]>, "Dietrich Ayala" > <[email protected]>, "Josh Carpenter" <[email protected]>, "Larissa > Shapiro" <[email protected]> > Sent: Tuesday, July 1, 2014 12:46:26 PM > Subject: Enabling Voice Input in Open Web / Firefox OS > "Many Voices, One Mozilla" > Hi All, > Here is the summary of high level draft plans we are beginning with for > enabling Voice Input in Open Web / Firefox OS. One of our Firefox OS > contributors Andre Natal (Brazil community) has done lots of preparatory > work and is currently continuing on a GSOC (Google Summer of Code) project > around this. The proposed 2 phases of the plan are in email below. > Please note the releases and estimates marked below are *all tentative* (will > change) and will be refined over next several months. We will continue > adding updates here: > Wiki > https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web > Bugzilla > Bug 1032964 - [B2G][SpeechRTC][User Story]: Enabling Voice input in Firefox > OS > Trello board to track status: > https://trello.com/b/UWXblmKb/webspeech-api > Github: > https://github.com/andrenatal/gecko-dev > There is lots to do here and we are just starting, so if you are interested, > pls watch this wiki and help with the dependent bug# being added to the > meta-bug above. This kind of project could use great community participation > with contributing code, collecting / testing voice samples with various > accents (Remember "many voices, One Mozilla") to improve the acoustic / > language models, creating fun gamifications to achieve that and yes, we > would need tons & tons of testing! > Regards, > Sandip > ---- > Sandip Kamat | Product Management, Firefox OS | Mozilla Corporation > [email protected], @sankam > > > Subject: Re: Planning SpeechRTC integration for FxOS > > > > > > Thanks for joining the kickoff meeting. Here is the current proposal and > > > next > > > steps. We will have a regular (frequency tbd) cadence of meetings on > > > this. > > > > > > Propose to approach in 2 phases > > > > > > * Phase 1 – Grammar based Voice Command app > > > > > > * Similar to this Nuance app: > > > https://www.youtube.com/watch?v=p0majfIEIR8&noredirect=1 > > > > > > * Needs Web Speech API implementation > > > > > > * Current App level solutions present in marketplace (Andre N) > > > > > > * Voicity (uses pocketSphinx AM, custom LM, Verbio decoder) ( > > > https://www.youtube.com/watch?v=cjjFvyH3kdc ) > > > > > > * Offline recognition on Peak ( > > > https://www.youtube.com/watch?v=FXKXhrRDEb8 > > > ) > > > > > > * > > > > > > Phase 2 – Free Speech (closer to Apple Siri experience) > > > > > > * “Vaani” App - Virtual Assistant with free speech > > > > > > * Web Speech API implementation enhancements > > > > > > * Need to benchmark decoder options & select solution, several decisions > > > pending > > > > > > * Current App Crab (uses decoder from Nuance Ndev mobile – dev edition) ( > > > https://www.youtube.com/watch?v=pnCRH-Iznrc ) > > > > > > Next Steps > > > > > > * Andre will finish most of phase 1 as a part of Google summer of code > > > project. Olli Pettay helping him. Needs support from CJ Ku's team to > > > finish > > > gecko porting. > > > > > > * Create Test scripts to test SpeechAPI (Andre, Clint). If that works > > > well, > > > we could declare support for SpeechAPI in version 2.x. The other > > > implementation in phase 1 can be targeted for later version 2.x. > > > > > > * Finalize AM/LM/decoder strategies and rocketbar integration for phase > > > 2. > > > Create release plan for phase 2 after some assessment of efforts. > > > > > > * Strategy on Back end server (Build vs license vs hybrid). Work with > > > services/Mark Mayo's team. (had a Call with Bill Maggs) > > > _______________________________________________ dev-webapps mailing list [email protected] https://lists.mozilla.org/listinfo/dev-webapps
