Hi This is a long post but I think that context may be important. Anyone who wants to cut to the chase should skip right to the bottom...
<context> After just over a decade contributing to FOSS[1] in my copious spare time, I took a break from commercial coding to study for my second Masters[2]. I focused on Machine Learning, Logic and Concurrency (scoring a shade over 88% for Semester One[3]) but at an increasing cost to my health. I dropped out when my computer access time reached zero towards the end of Semester Two. With physiotherapy and discipline over the last couple of years, I now have enough typing time to start thinking about working again. I promised myself that - once this happened - I'd start looking at the open source aural interface problem. I spent that first summer reading academic machine learning papers trying to work out why open source solutions don't exist, and recording my voice. Like most developers I've talked to, I originally assumed that data volume was the primary issue - and that given the fantastic quantity of stuff out there on the internet now, it should be possible to reverse engineer most of the stuff that's needed. But no: my reading led me to believe that the key issue preventing this promising approach is that the quality of basic nuts-and-bolts parts- of-speech recognition isn't high enough to allow modern machine learning techniques to be applied to this data, leaving the state of the art aiming to perfect techniques know in the 70's (now know to be flawed in theory). I think the root of the problem lies in the almost-universal initial use of Fast Fourier. Given that much greater human expertise is needed to recognize parts of speech after transformation than from simple wave-form, my guess is that dropping the transform and applying modern machine learning approaches the way humans recognize speech is the way to go. A secondary consideration is that Fast Fourier scales poorly across processors, so any user interface that performs Fast Fourier will be unresponsive. So, I'd like to start working on the open source aural interface problem from the ground up, applying modern machine learning techniques to the basics. But this means storing large quantities of high quality speech data. For an open source project, finding a host for this data is a fundamental step in establishing the provenance of any future integrated solution. The FOSS projects known to me are not good matches for my aims: 1. My interest focuses on aural interfaces. So promotion of Free Software is less important than ensuring that the license adopted is compatible with a wide range of downstream FOSS projects. The existing projects I know about use the GPL and are focused on promoting Free Software, rather than engineering. 2. I propose the use of alternative feature extraction and machine learning techniques. Existing projects concern themselves with compatibility with existing tools and aim for wide participation. This means throwing away high frequencies audible to the human ear by heavy down-sampling. This means that recognizing some parts of speech depends more heavily on context, which makes reverse-engineering of good dictionaries and oral speech models much more difficult. Good sound cards ship with many modern motherboards, and good speech recognition microphones are available for around £100. My primary use case is people dependent on a serious aural interface, (rather than occasional users). So, preserving the original fidelity makes sense. 3. In humankind, the wetware used to create sounds varies along several dimensions but shares a common engineering design. This suggests that - using modern statistical learning algorithms - a suitably parameterised vocal model could be efficiently tuned to a particular voice. This is my preferred approach. Existing projects aim to crowdsource an average voice. This has statistical disadvantages for my preferred approach. Instead, the first step for me is to prove a high quality statistical model of one voice. </context> In short: in order to take the first step towards open source aural interfaces, I need to find a host for large quantities of my speech (already recorded), at high fidelity (to preserve high frequencies) under an MIT license. Would this be something that Google Code might be interested in supporting? Robert [1] (linked in profile) http://robertburrelldonkin.name describes most of my ASF stuff (Member, ASF)http://apache.org/foundation/members.html (email rdonkin-at-apache.org for confirmation) http://people.apache.org/committer-index.html#rdonkin http://www.ohloh.net/accounts/robertburrelldonkin lists most of my other FOSS stuff or just use the world favourite engine ;-) www.google.co.uk/search?q="robert+burrell+donkin" [2] Advanced Computer Science@Manchester http://www.cs.manchester.ac.uk/postgraduate/taught/programmes/acs/ my previous degrees were in Mathematics http://www2.warwick.ac.uk/fac/sci/maths/ [3] scroll down http://robertburrelldonkin.name for more details. -- You received this message because you are subscribed to the Google Groups "Project Hosting on Google Code" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-code-hosting?hl=en.

