This is great to see! IMHO there are a bunch of great use-cases for
on-device speech recognition which are likely not suitable for server-based
approaches.

This is still only exposed
<https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/speech/speech_recognition.idl;l=47?q=speechrecognition%20file:%5C.idl&ss=chromium>
via a legacy prefixed API, window.webkitSpeechRecognition, right?  Any
reason why it wouldn't be trivial to unprefix the speech recognition API
(supporting both prefixed and unprefixed) at the same time? In general we
don't support making updates to APIs which are only exposed via
non-standard legacy prefixed API names.

On Tue, Jan 7, 2025 at 10:20 PM Yoav Weiss (@Shopify) <
yoavwe...@chromium.org> wrote:

>
>
> On Tue, Jan 7, 2025 at 9:50 PM Evan Liu <ev...@google.com> wrote:
>
>> * Are the resources downloaded partitioned per top-level site? What
>>> should typical download sizes be?
>>
>> This depends on the browser--for Chrome on Windows/Mac/Linux, there's
>> only one instance of each on-device speech recognition language pack and
>> each language pack is ~60MB. The spec doesn't necessarily dictate how the
>> downloads are handled, only that websites should be allowed to trigger a
>> download (or request a download) of a language.
>>
>
> This seems like it'd require at very least some extra considerations as
> part of the Privacy & Security section of the spec.
> It would also be good to have that be explicitly an implementation-defined
> decision.
>
> +Domenic Denicola <dome...@chromium.org> who's been working on similar
> privacy models related to translations, and can potentially advise you on
> the best path there.
>
>
>> Links to the minutes would be helpful. Filing official positions would be
>>> even better.
>>
>> I've filed official positions for Mozilla
>> <https://github.com/mozilla/standards-positions/issues/1157> and WebKit
>> <https://github.com/WebKit/standards-positions/issues/443>.
>>
>> Why not? Is it tested otherwise?
>>
>> Oops, I forgot to check that box. This feature is testable by
>> web-platform-tests.
>>
>
Have you written web platform tests for it? Have a link?

>
>> It’s implied that installOnDeviceSpeechRecognition() happens
>>> synchronously. Making this a blocking call seems problematic since it could
>>> involve a fetch and a download. I’d expect it to return a Promise (
>>> https://www.w3.org/TR/design-principles/#promises). And
>>> onDeviceWebSpeechAvailable should probably also be async since it could
>>> involve reading data from disk.
>>
>> Totally agree--the implementation of those two APIs on Chrome return
>> promises. I'll make sure the spec reflects this.
>>
>> The SpeechRecognitionMode "ondevice-only" value is only defined by a
>>> comment in the IDL stating that it “Returns an error if on-device speech
>>> recognition is not available”. What specifically returns an error?
>>> SpeechRecognition.start() doesn’t return any value, and in other error
>>> conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, what
>>> should the behavior be if SpeechRecognitionMode is changed after start()
>>> has already been called?
>>
>> Ah yeah, I'll update that comment to clarify that it fires a
>> SpeechRecognitionErrorEvent. Updating the SpeechRecognitionMode after
>> start() has been called has no effect on the existing session. This is
>> consistent with how other SpeechRecognition attributes work (i.e. lang,
>> maxAlternatives, etc.). This isn't explicitly stated anywhere in the spec,
>> so I'll file a spec issue to clarify this as well.
>>
>> As for mitigating privacy and fingerprinting risks, we've been
>> collaborating with the team building the Translator API
>> <https://chromestatus.com/feature/5172811302961152> feature which also
>> has the ability to download and detect language packs. Because the risks
>> between these two features are nearly identical, on-device speech
>> recognition language pack downloads will follow the same pattern and use
>> the same permissions UI as on-device translation language packs. Here are
>> some helpful links:
>> Privacy Design Doc
>>
>
> I don't think that's a link..
>
>
>> Translator API Developer Docs
>> <https://developer.chrome.com/docs/ai/translator-api>
>> Github Issue on Preventing Fingerprinting
>> <https://github.com/webmachinelearning/translation-api/issues/3>
>>
>> Thanks,
>> Evan
>>
>>
>> On Tue, Jan 7, 2025 at 10:34 AM Daniel Clark <dan...@microsoft.com>
>> wrote:
>>
>>> Adding to Yoav’s feedback about the spec:
>>>
>>>    - It’s implied that installOnDeviceSpeechRecognition() happens
>>>    synchronously. Making this a blocking call seems problematic since it 
>>> could
>>>    involve a fetch and a download. I’d expect it to return a Promise (
>>>    https://www.w3.org/TR/design-principles/#promises). And
>>>    onDeviceWebSpeechAvailable should probably also be async since it could
>>>    involve reading data from disk.
>>>    - The SpeechRecognitionMode "ondevice-only" value is only defined by
>>>    a comment in the IDL stating that it “Returns an error if on-device 
>>> speech
>>>    recognition is not available”. What specifically returns an error?
>>>    SpeechRecognition.start() doesn’t return any value, and in other error
>>>    conditions the behavior is to fire SpeechRecognitionErrorEvent. Also, 
>>> what
>>>    should the behavior be if SpeechRecognitionMode is changed after start()
>>>    has already been called?
>>>
>>>
>>>
>>> I also wonder if this should have a TAG review, especially given the
>>> privacy/fingerprinting implications of websites being able to query which
>>> on-device models are available.
>>>
>>>
>>>
>>> -- Dan Clark
>>>
>>>
>>>
>>> *From:* Yoav Weiss (@Shopify) <yoavwe...@chromium.org>
>>> *Sent:* Tuesday, January 7, 2025 12:29 AM
>>> *To:* Chromestatus <ad...@cr-status.appspotmail.com>
>>> *Cc:* blink-dev@chromium.org; ev...@google.com
>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to Ship: On-device Web
>>> Speech API
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jan 7, 2025 at 2:10 AM Chromestatus <
>>> ad...@cr-status.appspotmail.com> wrote:
>>>
>>> Contact emails
>>>
>>> ev...@google.com
>>> Explainer
>>>
>>> https://github.com/WebAudio/web-speech-api/pull/122
>>>
>>>
>>>
>>> An actual explainer with usage examples would've been useful.
>>>
>>> Also, the spec is not very detailed:
>>>
>>> * It seems to be triggering resource downloads, but Fetch
>>> <https://fetch.spec.whatwg.org/> integration is not specified.
>>>
>>> * Are the resources downloaded partitioned per top-level site? What
>>> should typical download sizes be?
>>>
>>>
>>>
>>>
>>> Specification
>>>
>>> https://webaudio.github.io/web-speech-api
>>> Summary
>>>
>>> This feature adds on-device speech recognition support to the Web Speech
>>> API, allowing websites to ensure that neither audio nor transcribed speech
>>> are sent to a third-party service for processing. Websites can query the
>>> availability of on-device speech recognition for specific languages, prompt
>>> users to install the necessary resources for on-device speech recognition,
>>> and choose between on-device or cloud-based speech recognition as needed.
>>>
>>>
>>> Blink component
>>>
>>> Blink>Speech
>>> <https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3ESpeech%22>
>>> Search tags
>>>
>>> speech <http://features#tags:speech>, recognition
>>> <http://features#tags:recognition>, local <http://features#tags:local>,
>>> offline <http://features#tags:offline>, on-device
>>> <http://features#tags:on-device>
>>> TAG review
>>>
>>> None
>>> TAG review status
>>>
>>> Pending
>>> Risks
>>>
>>>
>>> Interoperability and Compatibility
>>>
>>> None
>>>
>>>
>>>
>>> *Gecko*: Positive Discussed at TPAC 2024 with representatives from
>>> Mozilla including Paul Adenot
>>>
>>> *WebKit*: Positive Discussed at TPAC 2024 with representatives from
>>> Apple including Eric Carlson.
>>>
>>>
>>>
>>> Links to the minutes would be helpful. Filing official positions would
>>> be even better.
>>>
>>>
>>>
>>>
>>>
>>> *Web developers*: Positive Commonly requested feature. Examples:
>>> https://webwewant.fyi/wants/55/
>>> https://github.com/WebAudio/web-speech-api/issues/108
>>> https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser
>>> https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/
>>>
>>> *Other signals*:
>>> WebView application risks
>>>
>>> *Does this intent deprecate or change behavior of existing APIs, such
>>> that it has potentially high risk for Android WebView-based applications?*
>>>
>>> None
>>>
>>>
>>> Debuggability
>>>
>>> None
>>>
>>>
>>> Will this feature be supported on all six Blink platforms (Windows, Mac,
>>> Linux, ChromeOS, Android, and Android WebView)?
>>>
>>> No
>>>
>>> Initially supported on Windows, Mac, and Linux with ChromeOS support to
>>> follow.
>>>
>>>
>>> Is this feature fully tested by web-platform-tests
>>> <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>
>>> ?
>>>
>>> No
>>>
>>>
>>>
>>> Why not? Is it tested otherwise?
>>>
>>>
>>> Flag name on about://flags
>>>
>>> None
>>> Finch feature name
>>>
>>> InstallOnDeviceSpeechRecognition,OnDeviceWebSpeechAvailable,OnDeviceWebSpeech
>>>
>>> Requires code in //chrome?
>>>
>>> False
>>> Estimated milestones
>>>
>>> Shipping on desktop
>>>
>>> 135
>>>
>>>
>>> Anticipated spec changes
>>>
>>> *Open questions about a feature may be a source of future web compat or
>>> interop issues. Please list open issues (e.g. links to known github issues
>>> in the project for the feature specification) whose resolution may
>>> introduce web compat/interop risk (e.g., changing to naming or structure of
>>> the API in a non-backward-compatible way).*
>>>
>>> https://github.com/WebAudio/web-speech-api/pull/122
>>> Link to entry on the Chrome Platform Status
>>>
>>> https://chromestatus.com/feature/6090916291674112?gate=4683906480340992
>>>
>>> This intent message was generated by Chrome Platform Status
>>> <https://chromestatus.com/>.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to blink-dev+unsubscr...@chromium.org.
>>> To view this discussion visit
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/677c7f0e.2b0a0220.2e82a8.01f6.GAE%40google.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "blink-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to blink-dev+unsubscr...@chromium.org.
>>> To view this discussion visit
>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com
>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJFcq7nCbx372u8Qas0%3DUWbCUY9b37ak6fAN8CwGfFVcA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "blink-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to blink-dev+unsubscr...@chromium.org.
> To view this discussion visit
> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com
> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOmohSJb0cmJS4MxC7sTAnXNtrOXdV601QoGa_pXwseJH4%2Bhcw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY94v3z4OcQCYndKCG7Ydi-qeEpSbKPR32jwk70um_k5Yg%40mail.gmail.com.

Reply via email to