Re: [EXTERNAL] Re: [blink-dev] Intent to Ship: On-device Web Speech API

Alex Russell Wed, 08 Jan 2025 08:25:26 -0800

+1 to Dan's feedback; this needs an async API, likely with a streams design.


On Wednesday, January 8, 2025 at 7:33:12 AM UTC-8 Rick Byers wrote:

> This is great to see! IMHO there are a bunch of great use-cases for 
> on-device speech recognition which are likely not suitable for server-based 
> approaches.
>
> This is still only exposed 
> <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/speech/speech_recognition.idl;l=47?q=speechrecognition%20file:%5C.idl&ss=chromium>
>  
> via a legacy prefixed API, window.webkitSpeechRecognition, right?  Any 
> reason why it wouldn't be trivial to unprefix the speech recognition API 
> (supporting both prefixed and unprefixed) at the same time? In general we 
> don't support making updates to APIs which are only exposed via 
> non-standard legacy prefixed API names.
>
> On Tue, Jan 7, 2025 at 10:20 PM Yoav Weiss (@Shopify) <
> yoavwe...@chromium.org> wrote:
>
>>
>>
>> On Tue, Jan 7, 2025 at 9:50 PM Evan Liu <ev...@google.com> wrote:
>>
>>> * Are the resources downloaded partitioned per top-level site? What 
>>>> should typical download sizes be?
>>>
>>> This depends on the browser--for Chrome on Windows/Mac/Linux, there's 
>>> only one instance of each on-device speech recognition language pack and 
>>> each language pack is ~60MB. The spec doesn't necessarily dictate how the 
>>> downloads are handled, only that websites should be allowed to trigger a 
>>> download (or request a download) of a language. 
>>>
>>
>> This seems like it'd require at very least some extra considerations as 
>> part of the Privacy & Security section of the spec.
>> It would also be good to have that be explicitly an 
>> implementation-defined decision.
>>
>> +Domenic Denicola <dome...@chromium.org> who's been working on similar 
>> privacy models related to translations, and can potentially advise you on 
>> the best path there.
>>
>>
>>> Links to the minutes would be helpful. Filing official positions would 
>>>> be even better.
>>>
>>> I've filed official positions for Mozilla 
>>> <https://github.com/mozilla/standards-positions/issues/1157> and WebKit 
>>> <https://github.com/WebKit/standards-positions/issues/443>. 
>>>
>>> Why not? Is it tested otherwise?
>>>
>>> Oops, I forgot to check that box. This feature is testable by 
>>> web-platform-tests. 
>>>
>>
> Have you written web platform tests for it? Have a link? 
>
>>
>>> It’s implied that installOnDeviceSpeechRecognition() happens 
>>>> synchronously. Making this a blocking call seems problematic since it 
>>>> could 
>>>> involve a fetch and a download. I’d expect it to return a Promise (
>>>> https://www.w3.org/TR/design-principles/#promises). And 
>>>> onDeviceWebSpeechAvailable should probably also be async since it could 
>>>> involve reading data from disk.
>>>
>>> Totally agree--the implementation of those two APIs on Chrome return 
>>> promises. I'll make sure the spec reflects this.
>>>
>>> The SpeechRecognitionMode "ondevice-only" value is only defined by a 
>>>> comment in the IDL stating that it “Returns an error if on-device speech 
>>>> recognition is not available”. What specifically returns an error? 
>>>> SpeechRecognition.start() doesn’t return any value, and in other error 
>>>> conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what 
>>>> should the behavior be if SpeechRecognitionMode is changed after start() 
>>>> has already been called?
>>>
>>> Ah yeah, I'll update that comment to clarify that it fires a 
>>> SpeechRecognitionErrorEvent. Updating the SpeechRecognitionMode after 
>>> start() has been called has no effect on the existing session. This is 
>>> consistent with how other SpeechRecognition attributes work (i.e. lang, 
>>> maxAlternatives, etc.). This isn't explicitly stated anywhere in the spec, 
>>> so I'll file a spec issue to clarify this as well.
>>>
>>> As for mitigating privacy and fingerprinting risks, we've been 
>>> collaborating with the team building the Translator API 
>>> <https://chromestatus.com/feature/5172811302961152> feature which also 
>>> has the ability to download and detect language packs. Because the risks 
>>> between these two features are nearly identical, on-device speech 
>>> recognition language pack downloads will follow the same pattern and use 
>>> the same permissions UI as on-device translation language packs. Here are 
>>> some helpful links:
>>> Privacy Design Doc
>>>
>>
>> I don't think that's a link..
>>  
>>
>>> Translator API Developer Docs 
>>> <https://developer.chrome.com/docs/ai/translator-api>
>>> Github Issue on Preventing Fingerprinting 
>>> <https://github.com/webmachinelearning/translation-api/issues/3>
>>>
>>> Thanks,
>>> Evan
>>>  
>>>
>>> On Tue, Jan 7, 2025 at 10:34 AM Daniel Clark <dan...@microsoft.com> 
>>> wrote:
>>>
>>>> Adding to Yoav’s feedback about the spec:
>>>>
>>>>    - It’s implied that installOnDeviceSpeechRecognition() happens 
>>>>    synchronously. Making this a blocking call seems problematic since it 
>>>> could 
>>>>    involve a fetch and a download. I’d expect it to return a Promise (
>>>>    https://www.w3.org/TR/design-principles/#promises). And 
>>>>    onDeviceWebSpeechAvailable should probably also be async since it could 
>>>>    involve reading data from disk.
>>>>    - The SpeechRecognitionMode "ondevice-only" value is only defined 
>>>>    by a comment in the IDL stating that it “Returns an error if on-device 
>>>>    speech recognition is not available”. What specifically returns an 
>>>> error? 
>>>>    SpeechRecognition.start() doesn’t return any value, and in other error 
>>>>    conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, 
>>>> what 
>>>>    should the behavior be if SpeechRecognitionMode is changed after 
>>>> start() 
>>>>    has already been called?
>>>>
>>>>  
>>>>
>>>> I also wonder if this should have a TAG review, especially given the 
>>>> privacy/fingerprinting implications of websites being able to query which 
>>>> on-device models are available.
>>>>
>>>>  
>>>>
>>>> -- Dan Clark
>>>>
>>>>  
>>>>
>>>> *From:* Yoav Weiss (@Shopify) <yoavwe...@chromium.org> 
>>>> *Sent:* Tuesday, January 7, 2025 12:29 AM
>>>> *To:* Chromestatus <ad...@cr-status.appspotmail.com>
>>>> *Cc:* blink-dev@chromium.org; ev...@google.com
>>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to Ship: On-device Web 
>>>> Speech API
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>> On Tue, Jan 7, 2025 at 2:10 AM Chromestatus <
>>>> ad...@cr-status.appspotmail.com> wrote:
>>>>
>>>> Contact emails 
>>>>
>>>> ev...@google.com 
>>>> Explainer 
>>>>
>>>> https://github.com/WebAudio/web-speech-api/pull/122
>>>>
>>>>  
>>>>
>>>> An actual explainer with usage examples would've been useful.
>>>>
>>>> Also, the spec is not very detailed:
>>>>
>>>> * It seems to be triggering resource downloads, but Fetch 
>>>> <https://fetch.spec.whatwg.org/> integration is not specified. 
>>>>
>>>> * Are the resources downloaded partitioned per top-level site? What 
>>>> should typical download sizes be?
>>>>
>>>>  
>>>>
>>>>  
>>>> Specification 
>>>>
>>>> https://webaudio.github.io/web-speech-api 
>>>> Summary 
>>>>
>>>> This feature adds on-device speech recognition support to the Web 
>>>> Speech API, allowing websites to ensure that neither audio nor transcribed 
>>>> speech are sent to a third-party service for processing. Websites can 
>>>> query 
>>>> the availability of on-device speech recognition for specific languages, 
>>>> prompt users to install the necessary resources for on-device speech 
>>>> recognition, and choose between on-device or cloud-based speech 
>>>> recognition 
>>>> as needed.
>>>>
>>>>  
>>>> Blink component 
>>>>
>>>> Blink>Speech 
>>>> <https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3ESpeech%22>
>>>>  
>>>> Search tags 
>>>>
>>>> speech <http://features#tags:speech>, recognition 
>>>> <http://features#tags:recognition>, local <http://features#tags:local>, 
>>>> offline <http://features#tags:offline>, on-device 
>>>> <http://features#tags:on-device> 
>>>> TAG review 
>>>>
>>>> None 
>>>> TAG review status 
>>>>
>>>> Pending 
>>>> Risks 
>>>>
>>>>  
>>>> Interoperability and Compatibility 
>>>>
>>>> None
>>>>
>>>>
>>>>
>>>> *Gecko*: Positive Discussed at TPAC 2024 with representatives from 
>>>> Mozilla including Paul Adenot 
>>>>
>>>> *WebKit*: Positive Discussed at TPAC 2024 with representatives from 
>>>> Apple including Eric Carlson.
>>>>
>>>>  
>>>>
>>>> Links to the minutes would be helpful. Filing official positions would 
>>>> be even better.
>>>>
>>>>  
>>>>
>>>>
>>>>
>>>> *Web developers*: Positive Commonly requested feature. Examples: 
>>>> https://webwewant.fyi/wants/55/ 
>>>> https://github.com/WebAudio/web-speech-api/issues/108 
>>>> https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser
>>>>  
>>>> https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/
>>>>  
>>>>
>>>> *Other signals*: 
>>>> WebView application risks 
>>>>
>>>> *Does this intent deprecate or change behavior of existing APIs, such 
>>>> that it has potentially high risk for Android WebView-based applications?*
>>>>
>>>> None
>>>>
>>>>  
>>>> Debuggability 
>>>>
>>>> None
>>>>
>>>>  
>>>> Will this feature be supported on all six Blink platforms (Windows, 
>>>> Mac, Linux, ChromeOS, Android, and Android WebView)? 
>>>>
>>>> No 
>>>>
>>>> Initially supported on Windows, Mac, and Linux with ChromeOS support to 
>>>> follow.
>>>>
>>>>  
>>>> Is this feature fully tested by web-platform-tests 
>>>> <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>
>>>> ? 
>>>>
>>>> No
>>>>
>>>>  
>>>>
>>>> Why not? Is it tested otherwise?  
>>>>
>>>>  
>>>> Flag name on about://flags 
>>>>
>>>> None 
>>>> Finch feature name 
>>>>
>>>> InstallOnDeviceSpeechRecognition,OnDeviceWebSpeechAvailable,OnDeviceWebSpeech
>>>>  
>>>>
>>>> Requires code in //chrome? 
>>>>
>>>> False 
>>>> Estimated milestones 
>>>>
>>>> Shipping on desktop
>>>>
>>>> 135
>>>>
>>>>  
>>>> Anticipated spec changes 
>>>>
>>>> *Open questions about a feature may be a source of future web compat or 
>>>> interop issues. Please list open issues (e.g. links to known github issues 
>>>> in the project for the feature specification) whose resolution may 
>>>> introduce web compat/interop risk (e.g., changing to naming or structure 
>>>> of 
>>>> the API in a non-backward-compatible way).*
>>>>
>>>> https://github.com/WebAudio/web-speech-api/pull/122 
>>>> Link to entry on the Chrome Platform Status 
>>>>
>>>> https://chromestatus.com/feature/6090916291674112?gate=4683906480340992 
>>>>
>>>> This intent message was generated by Chrome Platform Status 
>>>> <https://chromestatus.com/>. 
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "blink-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to blink-dev+unsubscr...@chromium.org.
>>>> To view this discussion visit 
>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com
>>>>  
>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "blink-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to blink-dev+unsubscr...@chromium.org.
>>>> To view this discussion visit 
>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "blink-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to blink-dev+unsubscr...@chromium.org.
>>
> To view this discussion visit 
>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com
>>  
>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/8d98c2cc-7873-4888-9dc1-f10dabe4da79n%40chromium.org.

Re: [EXTERNAL] Re: [blink-dev] Intent to Ship: On-device Web Speech API

Reply via email to