On Wed, Mar 19, 2014 at 5:02 PM, Suman Saurabh <ss.sumansaurab...@gmail.com> wrote: > Hi , > I am Suman Saurabh pursuing B. Tech. I was writing proposal for Google > Summer of Code 2014 related to issue Stanbol- 1007. Some things were not > clear to me like " Enhancement Results should keep track of the temporal > position of the extracted text within the processed media file " . Please > provide some insight to it. >
It would be nice if the engine would not just provide the extracted text, but also annotation about the time when this text was spoken in the parsed audio/video file. A possible use case would be a client that wants to highlight the text currently spoken in a Audio/Video file. > I also like to discuss my proposal with you for regarding GSOC the issue > Stanbol 1007. Sir, I am acquainted with the workings of PocketPhoenix ( > at-least with all the api's for a good application development), I do not > know how the utterance are broken or HMM retrievals are for words are done > (they are blackbox for me), but I could use or edit their api's for > application development. I can easily acquaint myself with Sphinix4 and its > libraries which will be required to in Stanbol 1007. Is this enough for me > to apply on this issue for my GSOC Project. Here is link to my proposal : > https://sites.google.com/site/gsoc2014stanbol > Some comments to your proposal: > Audio Data captured from microphone will be parsed with ContentItem. The capture of Audio from a Microphone is not a good use case as the Stanbol instance will most likely run on a server. The engine needs to be able to deal with typical audio and video file (mpg video and audio, mp3, ...). If Sphinix4 can not read those data by itself one would need an pre-processing engine that converts those files to data Sphinix4 can process. > Acoustic Model and Language Model to be used Stanbol does use the DataFileProvider infrastructure [1] for handling big binary configuration files. The configuration of Acoustic and Language Models will need to use this infrastructure. > Extracted text as plain text Blob are then fed to same ContentItem as mentioned above the engine should also add annotations the assigns parts of the text to temporal positions (time spans) within the parsed audio/video file. Finally do not forget to add you background esp. contributions to open source projects and experiences with used technologies/frameworks (OSGI, RDF, Sphinix ...) to your proposal. Hope this helps with improving your proposal. Thx for the interest in Stanbol and all the best Rupert [1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen