I am developing an app to transcribe audio recordings on my phone. I am using the "video" model. The problem I am having is that the transcription breaks the text into various speakers, even though its just me speaking.
I am wondering whether I should be using the Phone Call model instead, which supports the Speaker Diarization function (video does not, apparently.) And if I use the Phone Call model, and I have a recording which is three hours long, will this cause problems? Finally, if I am trying to produce a transcript with the most accurate punctuation, does one model (Video, Phone Call, etc) work better than others? Thanks! -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/298a770b-32ae-4304-beba-24d72a1010ed%40googlegroups.com.
