Contact emails a...@chromium.org, m...@chromium.org, btri...@chromium.org, dome...@chromium.org, kenjibah...@chromium.org
Explainer https://github.com/webmachinelearning/prompt-api/blob/main/README.md Specification None yet, although we'll be writing one during the prototyping period Summary A JavaScript API for directly prompting an AI language model, including text, image, and audio inputs. This API is also exposed in Chrome Extensions, currently as an Origin Trial. This Intent tracks the exposure on the web. Comments Although this feature exposes an on-device language model, the language model is never trained on, and does not have access to, any local user-specific data. Blink component Blink>AI>Prompt <https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3EAI%3EPrompt%22> Motivation Although we are already exploring task-based built-in AI APIs (e.g. translator, summarizer, etc.), direct access to a language model can help web developers accomplish tasks beyond the ones we have designed specific APIs for. Compared to bring-your-own-AI approaches, using the language model can save the user's bandwidth, disk space, and have a lower barrier to entry. Initial public proposal https://github.com/webmachinelearning/charter/pull/9 TAG review Because this is early-stage exploratory work, we believe it would be better to wait until we are ready to start an Origin Trial before asking for wider review. TAG review status Pending Risks Interoperability and Compatibility This feature has definite interoperability and compatibility risks. Because the output in response to a given prompt varies by language model, it is possible for developers to write brittle code that relies on specific output formats or quality, and does not work across multiple browsers or multiple versions of the same browser. There are some reasons to be optimistic that web developers won't write such brittle code. Language models are inherently nondeterministic, so creating dependencies on their exact output is difficult. And many users will not have the hardware necessary to run a language model, so developers will need to code in a way such that the prompt API is always used as an enhancement, or has appropriate fallback to cloud services. Several parts of the API design help steer developers in the right direction, as well. The API has clear availability testing features for developers to use, and requires developers to state their required capabilities (e.g., modalities and languages) up front. Most importantly, the structured outputs feature [1] can help mitigate against writing brittle code that relies on specific output formats, as illustrated in [2]. [1]: https://github.com/webmachinelearning/prompt-api/blob/main/README.md#structured-output-with-json-schema-or-regexp-constraints [2]: https://github.com/webmachinelearning/prompt-api/issues/35 Gecko: No signal Because this is early-stage exploratory work, we believe it would be better to wait until we are ready to start an Origin Trial before asking for wider review. WebKit: No signal Because this is early-stage exploratory work, we believe it would be better to wait until we are ready to start an Origin Trial before asking for wider review. Web developers: Strongly positive ( https://github.com/webmachinelearning/prompt-api/issues/74) Developers working with the version of this API that is exposed for Chrome extensions and in the Early Preview Program have given significant positive feedback. See, e.g., https://docs.google.com/presentation/d/1DhFC2oB4PRrchavxUY3h9U4w4hrX5DAc5LoMqhn5hnk/edit#slide=id.g349a9ada368_1_6327 for some feedback from developer surveys. Other signals: We are also working with Microsoft Edge developers on this feature, with them contributing the structured output functionality. Activation This feature would definitely benefit from having polyfills, backed by any of: cloud services, lazily-loaded client-side models using WebGPU, or the web developer's own server. We anticipate seeing an ecosystem of such polyfills grow as more developers experiment with this API. WebView application risks Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications? None Debuggability It is possible that giving DevTools more insight into the nondeterministic states of the model, e.g. random seeds, could help with debugging. See discussion at https://github.com/webmachinelearning/prompt-api/issues/74. We also have some internal debugging pages which give more detail on the model's status, e.g. chrome://on-device-internals, and parts of these might be suitable to port into DevTools. Is this feature fully tested by web-platform-tests <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md> ? No We plan to write web platform tests for the API surface as much as possible. The core responses from the model will be difficult to test, but some facets are testable, e.g. the adherence to structured output response constraints. Flag name on about://flags prompt-api-for-gemini-nano-multimodal-input Finch feature name AIPromptAPIMultimodalInput Requires code in //chrome? True Measurement We have various use counters for the API, e.g. LanguageModel_Create Non-OSS dependencies Does the feature depend on any code or APIs outside the Chromium open source repository and its open-source dependencies to function? Yes: this feature depends on a language model, which is bridged to the open-source parts of the implementation via the interfaces in //services/on_device_model. Estimated milestones DevTrial on desktop 137 DevTrial on Android 137 Link to entry on the Chrome Platform Status https://chromestatus.com/feature/5134603979063296?gate=6192899053846528 This intent message was generated by Chrome Platform Status <https://chromestatus.com/>. -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra_LXU8KkcVJ0x%3DzYa4h_sC3FaHGdaoM59FNwwtRAsOALQ%40mail.gmail.com.