+1 to Dan's feedback; this needs an async API, likely with a streams design.
On Wednesday, January 8, 2025 at 7:33:12 AM UTC-8 Rick Byers wrote: > This is great to see! IMHO there are a bunch of great use-cases for > on-device speech recognition which are likely not suitable for server-based > approaches. > > This is still only exposed > <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/speech/speech_recognition.idl;l=47?q=speechrecognition%20file:%5C.idl&ss=chromium> > > via a legacy prefixed API, window.webkitSpeechRecognition, right? Any > reason why it wouldn't be trivial to unprefix the speech recognition API > (supporting both prefixed and unprefixed) at the same time? In general we > don't support making updates to APIs which are only exposed via > non-standard legacy prefixed API names. > > On Tue, Jan 7, 2025 at 10:20 PM Yoav Weiss (@Shopify) < > yoavwe...@chromium.org> wrote: > >> >> >> On Tue, Jan 7, 2025 at 9:50 PM Evan Liu <ev...@google.com> wrote: >> >>> * Are the resources downloaded partitioned per top-level site? What >>>> should typical download sizes be? >>> >>> This depends on the browser--for Chrome on Windows/Mac/Linux, there's >>> only one instance of each on-device speech recognition language pack and >>> each language pack is ~60MB. The spec doesn't necessarily dictate how the >>> downloads are handled, only that websites should be allowed to trigger a >>> download (or request a download) of a language. >>> >> >> This seems like it'd require at very least some extra considerations as >> part of the Privacy & Security section of the spec. >> It would also be good to have that be explicitly an >> implementation-defined decision. >> >> +Domenic Denicola <dome...@chromium.org> who's been working on similar >> privacy models related to translations, and can potentially advise you on >> the best path there. >> >> >>> Links to the minutes would be helpful. Filing official positions would >>>> be even better. >>> >>> I've filed official positions for Mozilla >>> <https://github.com/mozilla/standards-positions/issues/1157> and WebKit >>> <https://github.com/WebKit/standards-positions/issues/443>. >>> >>> Why not? Is it tested otherwise? >>> >>> Oops, I forgot to check that box. This feature is testable by >>> web-platform-tests. >>> >> > Have you written web platform tests for it? Have a link? > >> >>> It’s implied that installOnDeviceSpeechRecognition() happens >>>> synchronously. Making this a blocking call seems problematic since it >>>> could >>>> involve a fetch and a download. I’d expect it to return a Promise ( >>>> https://www.w3.org/TR/design-principles/#promises). And >>>> onDeviceWebSpeechAvailable should probably also be async since it could >>>> involve reading data from disk. >>> >>> Totally agree--the implementation of those two APIs on Chrome return >>> promises. I'll make sure the spec reflects this. >>> >>> The SpeechRecognitionMode "ondevice-only" value is only defined by a >>>> comment in the IDL stating that it “Returns an error if on-device speech >>>> recognition is not available”. What specifically returns an error? >>>> SpeechRecognition.start() doesn’t return any value, and in other error >>>> conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what >>>> should the behavior be if SpeechRecognitionMode is changed after start() >>>> has already been called? >>> >>> Ah yeah, I'll update that comment to clarify that it fires a >>> SpeechRecognitionErrorEvent. Updating the SpeechRecognitionMode after >>> start() has been called has no effect on the existing session. This is >>> consistent with how other SpeechRecognition attributes work (i.e. lang, >>> maxAlternatives, etc.). This isn't explicitly stated anywhere in the spec, >>> so I'll file a spec issue to clarify this as well. >>> >>> As for mitigating privacy and fingerprinting risks, we've been >>> collaborating with the team building the Translator API >>> <https://chromestatus.com/feature/5172811302961152> feature which also >>> has the ability to download and detect language packs. Because the risks >>> between these two features are nearly identical, on-device speech >>> recognition language pack downloads will follow the same pattern and use >>> the same permissions UI as on-device translation language packs. Here are >>> some helpful links: >>> Privacy Design Doc >>> >> >> I don't think that's a link.. >> >> >>> Translator API Developer Docs >>> <https://developer.chrome.com/docs/ai/translator-api> >>> Github Issue on Preventing Fingerprinting >>> <https://github.com/webmachinelearning/translation-api/issues/3> >>> >>> Thanks, >>> Evan >>> >>> >>> On Tue, Jan 7, 2025 at 10:34 AM Daniel Clark <dan...@microsoft.com> >>> wrote: >>> >>>> Adding to Yoav’s feedback about the spec: >>>> >>>> - It’s implied that installOnDeviceSpeechRecognition() happens >>>> synchronously. Making this a blocking call seems problematic since it >>>> could >>>> involve a fetch and a download. I’d expect it to return a Promise ( >>>> https://www.w3.org/TR/design-principles/#promises). And >>>> onDeviceWebSpeechAvailable should probably also be async since it could >>>> involve reading data from disk. >>>> - The SpeechRecognitionMode "ondevice-only" value is only defined >>>> by a comment in the IDL stating that it “Returns an error if on-device >>>> speech recognition is not available”. What specifically returns an >>>> error? >>>> SpeechRecognition.start() doesn’t return any value, and in other error >>>> conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, >>>> what >>>> should the behavior be if SpeechRecognitionMode is changed after >>>> start() >>>> has already been called? >>>> >>>> >>>> >>>> I also wonder if this should have a TAG review, especially given the >>>> privacy/fingerprinting implications of websites being able to query which >>>> on-device models are available. >>>> >>>> >>>> >>>> -- Dan Clark >>>> >>>> >>>> >>>> *From:* Yoav Weiss (@Shopify) <yoavwe...@chromium.org> >>>> *Sent:* Tuesday, January 7, 2025 12:29 AM >>>> *To:* Chromestatus <ad...@cr-status.appspotmail.com> >>>> *Cc:* blink-dev@chromium.org; ev...@google.com >>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to Ship: On-device Web >>>> Speech API >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jan 7, 2025 at 2:10 AM Chromestatus < >>>> ad...@cr-status.appspotmail.com> wrote: >>>> >>>> Contact emails >>>> >>>> ev...@google.com >>>> Explainer >>>> >>>> https://github.com/WebAudio/web-speech-api/pull/122 >>>> >>>> >>>> >>>> An actual explainer with usage examples would've been useful. >>>> >>>> Also, the spec is not very detailed: >>>> >>>> * It seems to be triggering resource downloads, but Fetch >>>> <https://fetch.spec.whatwg.org/> integration is not specified. >>>> >>>> * Are the resources downloaded partitioned per top-level site? What >>>> should typical download sizes be? >>>> >>>> >>>> >>>> >>>> Specification >>>> >>>> https://webaudio.github.io/web-speech-api >>>> Summary >>>> >>>> This feature adds on-device speech recognition support to the Web >>>> Speech API, allowing websites to ensure that neither audio nor transcribed >>>> speech are sent to a third-party service for processing. Websites can >>>> query >>>> the availability of on-device speech recognition for specific languages, >>>> prompt users to install the necessary resources for on-device speech >>>> recognition, and choose between on-device or cloud-based speech >>>> recognition >>>> as needed. >>>> >>>> >>>> Blink component >>>> >>>> Blink>Speech >>>> <https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3ESpeech%22> >>>> >>>> Search tags >>>> >>>> speech <http://features#tags:speech>, recognition >>>> <http://features#tags:recognition>, local <http://features#tags:local>, >>>> offline <http://features#tags:offline>, on-device >>>> <http://features#tags:on-device> >>>> TAG review >>>> >>>> None >>>> TAG review status >>>> >>>> Pending >>>> Risks >>>> >>>> >>>> Interoperability and Compatibility >>>> >>>> None >>>> >>>> >>>> >>>> *Gecko*: Positive Discussed at TPAC 2024 with representatives from >>>> Mozilla including Paul Adenot >>>> >>>> *WebKit*: Positive Discussed at TPAC 2024 with representatives from >>>> Apple including Eric Carlson. >>>> >>>> >>>> >>>> Links to the minutes would be helpful. Filing official positions would >>>> be even better. >>>> >>>> >>>> >>>> >>>> >>>> *Web developers*: Positive Commonly requested feature. Examples: >>>> https://webwewant.fyi/wants/55/ >>>> https://github.com/WebAudio/web-speech-api/issues/108 >>>> https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser >>>> >>>> https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/ >>>> >>>> >>>> *Other signals*: >>>> WebView application risks >>>> >>>> *Does this intent deprecate or change behavior of existing APIs, such >>>> that it has potentially high risk for Android WebView-based applications?* >>>> >>>> None >>>> >>>> >>>> Debuggability >>>> >>>> None >>>> >>>> >>>> Will this feature be supported on all six Blink platforms (Windows, >>>> Mac, Linux, ChromeOS, Android, and Android WebView)? >>>> >>>> No >>>> >>>> Initially supported on Windows, Mac, and Linux with ChromeOS support to >>>> follow. >>>> >>>> >>>> Is this feature fully tested by web-platform-tests >>>> <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md> >>>> ? >>>> >>>> No >>>> >>>> >>>> >>>> Why not? Is it tested otherwise? >>>> >>>> >>>> Flag name on about://flags >>>> >>>> None >>>> Finch feature name >>>> >>>> InstallOnDeviceSpeechRecognition,OnDeviceWebSpeechAvailable,OnDeviceWebSpeech >>>> >>>> >>>> Requires code in //chrome? >>>> >>>> False >>>> Estimated milestones >>>> >>>> Shipping on desktop >>>> >>>> 135 >>>> >>>> >>>> Anticipated spec changes >>>> >>>> *Open questions about a feature may be a source of future web compat or >>>> interop issues. Please list open issues (e.g. links to known github issues >>>> in the project for the feature specification) whose resolution may >>>> introduce web compat/interop risk (e.g., changing to naming or structure >>>> of >>>> the API in a non-backward-compatible way).* >>>> >>>> https://github.com/WebAudio/web-speech-api/pull/122 >>>> Link to entry on the Chrome Platform Status >>>> >>>> https://chromestatus.com/feature/6090916291674112?gate=4683906480340992 >>>> >>>> This intent message was generated by Chrome Platform Status >>>> <https://chromestatus.com/>. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "blink-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to blink-dev+unsubscr...@chromium.org. >>>> To view this discussion visit >>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com >>>> >>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "blink-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to blink-dev+unsubscr...@chromium.org. >>>> To view this discussion visit >>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com >>>> >>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "blink-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to blink-dev+unsubscr...@chromium.org. >> > To view this discussion visit >> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com >> >> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8d98c2cc-7873-4888-9dc1-f10dabe4da79n%40chromium.org.