Re: Regarding issue: Stanbol 1007

Rupert Westenthaler Thu, 20 Mar 2014 00:25:34 -0700

On Wed, Mar 19, 2014 at 5:02 PM, Suman Saurabh
<ss.sumansaurab...@gmail.com> wrote:
> Hi  ,
> I am Suman Saurabh pursuing B. Tech. I was writing proposal for Google
> Summer of Code 2014 related to issue Stanbol- 1007. Some things were not
> clear to me like " Enhancement Results should keep track of the temporal
> position of the extracted text within the processed media file " . Please
> provide some insight to it.
>


It would be nice if the engine would not just provide the extracted
text, but also annotation about the time when this text was spoken in
the parsed audio/video file. A possible use case would be a client
that wants to highlight the text currently spoken in a Audio/Video
file.


> I also like to discuss my proposal with you for regarding GSOC the issue
> Stanbol 1007. Sir, I am acquainted with the workings of PocketPhoenix (
> at-least with all the api's for a good application development), I do not
> know how the utterance are broken or HMM retrievals are for words are done
> (they are blackbox for me), but I could use or edit their api's for
> application development. I can easily acquaint myself with Sphinix4 and its
> libraries which will be required to in Stanbol 1007. Is this enough for me
> to apply on this issue for my GSOC Project. Here is link to my proposal :
> https://sites.google.com/site/gsoc2014stanbol
>

Some comments to your proposal:

> Audio Data captured from microphone will be parsed with ContentItem.

The capture of Audio from a Microphone is not a good use case as the
Stanbol instance will most likely run on a server. The engine needs to
be able to deal with typical audio and video file (mpg video and
audio, mp3, ...). If Sphinix4 can not read those data by itself one
would need an pre-processing engine that converts those files to data
Sphinix4 can process.

> Acoustic Model and Language Model to be used

Stanbol does use the DataFileProvider infrastructure [1] for handling
big binary configuration files. The configuration of Acoustic and
Language Models will need to use this infrastructure.

> Extracted text as plain text Blob are then fed to same ContentItem

as mentioned above the engine should also add annotations the assigns
parts of the text to temporal positions (time spans) within the parsed
audio/video file.

Finally do not forget to add you background esp. contributions to open
source projects and experiences with used technologies/frameworks
(OSGI, RDF, Sphinix ...) to your proposal.

Hope this helps with improving your proposal.

Thx for the interest in Stanbol and all the best
Rupert

[1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Regarding issue: Stanbol 1007

Reply via email to