Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

Petter Reinholdtsen Wed, 21 Jun 2023 08:36:25 -0700


The upload to contrib / experimental was rejected by the ftpmasters with
the following comment:


> can you please explain how I can recreate the files *.tiktoken?  There
> seem to be some sources missing ...

The two files in question are 50k lines of ASCII text that seem to be
some kind of index / vocabulary, and I have no idea how they were
created.  I suspect they might be an artifact of the model training, but
do not know.  Anyone got a clue to spare on how these were created and
how to rebuild them?  If we lack the source to rebuild them, I currently
believe the whisper package will have to go to non-free, not contrib.
Any help to figure this out would be most appreciated.

-- 
Happy hacking
Petter Reinholdtsen

Bug#1034091: RFP: whisper -- Robust Speech Recognition via Large-Scale Weak Supervision

Reply via email to