I have an Android phone recording of a meeting that I'd like to transcribe to text, and was hoping there was an automagic way to convert most of it.

My google-fu seems to be failing me, or else that info simply isn't out there.

The closest I've found is the use of a tool called pocketsphinx (which seems to be a toolset for developers to be used in applications, not a finished product suitable for use by an end-user).

I saved my Android audio file to Google Drive, and downloaded it from there to my main Debian box (for unknown reasons, I couldn't use GMail to mail it as an attachment, or attach it to an FB Message).

As per what I was able to figure out about it, I then converted my .mp4 file to a .wav file:

ffmpeg -i ../Downloads/Dana\ of\ City.m4a -ar 16000 -ac 1 DanaOfCity.wav

I verified that worked with

aplay DanaOfCity.wav

Then I worked out the basics (I hoped) of the command to be this (mostly from an example at https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-to-text (answer 8, footnote 2):

pocketsphinx_continuous -infile ~/Downloads/DanaOfCity.wav -hmm en_US/hub4wsj_sc_8k -lm en_US/hub4.5000.DMP 2> pocketsphinx.log

which I understand to mean:
- my input file is my data file "DanaOfCity.wav"
- the "-hmm" parameter (which pocketsphinx knows automatically is in /usr/share/pocketsphinx/model) is to be the "en_US/hub4wwsj_sc_8k" directory (which contains the "mdef" model (whatever that is), which it had trouble finding until I added the "hub..." part) - the "-lm" (is that a digit one or a letter ell? ah, it's an ell) parameter works like the "-hmm" parameter above - the 2> pocketsphinx.log routes the normal output of the app to a file named "pocketsphinx.log" in the current directory

When I run this, the process files with this output in my pocketsphinx.log output:

westk@westek:~/bub$ cat pocketsphinx.log
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
Current configuration:
[NAME]            [DEFLT]        [VALUE]
-agc            none        none
-agcthresh        2.0        2.000000e+00
-allphone
-allphone_ci        no        no
-alpha            0.97        9.700000e-01
-ascale            20.0        2.000000e+01
-aw            1        1
-backtrace        no        no
-beam            1e-48        1.000000e-48
-bestpath        yes        yes
-bestpathlw        9.5        9.500000e+00
-ceplen            13        13
-cmn            current        current
-cmninit        8.0        56,-3,1
-compallsen        no        no
-debug                    0
-dict
-dictcase        no        no
-dither            no        no
-doublebw        no        no
-ds            1        1
-fdict /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
-feat            1s_c_d_dd    1s_c_d_dd
-featparams /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
-fillprob        1e-8        1.000000e-08
-frate            100        100
-fsg
-fsgusealtpron        yes        yes
-fsgusefiller        yes        yes
-fwdflat        yes        yes
-fwdflatbeam        1e-64        1.000000e-64
-fwdflatefwid        4        4
-fwdflatlw        8.5        8.500000e+00
-fwdflatsfwin        25        25
-fwdflatwbeam        7e-29        7.000000e-29
-fwdtree        yes        yes
-hmm /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endian        little        little
-jsgf
-keyphrase
-kws
-kws_delay        10        10
-kws_plp        1e-1        1.000000e-01
-kws_threshold        1        1.000000e+00
-latsize        5000        5000
-lda
-ldadim            0        0
-lifter            0        0
-lm /usr/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP
-lmctl
-lmname
-logbase        1.0001        1.000100e+00
-logfn
-logspec        no        no
-lowerf            133.33334    1.000000e+00
-lpbeam            1e-40        1.000000e-40
-lponlybeam        7e-29        7.000000e-29
-lw            6.5        6.500000e+00
-maxhmmpf        30000        30000
-maxwpf            -1        -1
-mdef /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
-mean /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
-mfclogdir
-min_endfr        0        0
-mixw
-mixwfloor        0.0000001    1.000000e-07
-mllr
-mmap            yes        yes
-ncep            13        13
-nfft            512        512
-nfilt            40        20
-nwpen            1.0        1.000000e+00
-pbeam            1e-48        1.000000e-48
-pip            1.0        1.000000e+00
-pl_beam        1e-10        1.000000e-10
-pl_pbeam        1e-10        1.000000e-10
-pl_pip            1.0        1.000000e+00
-pl_weight        3.0        3.000000e+00
-pl_window        5        5
-rawlogdir
-remove_dc        no        yes
-remove_noise        yes        yes
-remove_silence        yes        yes
-round_filters        yes        no
-samprate        16000        1.600000e+04
-seed            -1        -1
-sendump /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
-senlogdir
-senmgau
-silprob        0.005        5.000000e-03
-smoothspec        no        no
-svspec                    0-12/13-25/26-38
-tmat /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
-tmatfloor        0.0001        1.000000e-04
-topn            4        4
-topn_beam        0        0
-toprule
-transform        legacy        dct
-unit_area        yes        yes
-upperf            6855.4976    4.000000e+03
-uw            1.0        1.000000e+00
-vad_postspeech        50        50
-vad_prespeech        20        20
-vad_startspeech    10        10
-vad_threshold        2.0        2.000000e+00
-var /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
-varfloor        0.0001        1.000000e-04
-varnorm        no        no
-verbose        no        no
-warp_params
-warp_type        inverse_linear    inverse_linear
-wbeam            7e-29        7.000000e-29
-wip            0.65        6.500000e-01
-wlen            0.025625    2.500000e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef INFO: bin_mdef.c(516): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: ptm_mgau.c(805): Number of codebooks doesn't match number of ciphones, doesn't look like PTM: 1 != 50
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(904): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4107 * 32 bytes (128 KiB) for word entries
INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones INFO: dict2pid.c(132): Allocated 60400 bytes (58 KiB) for word-final triphones INFO: dict2pid.c(196): Allocated 60400 bytes (58 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(467): Header doesn't match
INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
INFO: ngram_model_trie.c(70): No \data\ mark in LM file
INFO: ngram_model_trie.c(548): Trying to read LM in DMP format
INFO: ngram_model_trie.c(630): ngrams 1=5001, 2=436879, 3=418286
pocketsphinx_continuous: ngrams_raw.c:372: ngrams_raw_read_dmp: Assertion `ngram_idx == counts[2]' failed.


Anyone out there who can help me accomplish my goal, using pocketsphinx or any other suitable tool?

Thanks!

--
Kent West     <*)))><
http://kentwest.blogspot.com
Praise Yah! \o/

Reply via email to