(Perhaps this has been on our radar at some point in the past, but I
thought I'd try some blue sky brainstorming and see if that takes us good
places.)

I think it's important, to remain competitive, that we have a good,
pluggable, way to interact with mobile devices via the spoken word.  What
probably immediately springs to mind is something like Apple's Siri,
Google's solution, or perhaps for accessibility. I do mean those things.
But one thing that makes FxOS stand apart is its openness.

People who are at home on the command line can be powerful and productive,
quickly accessing and controlling their environment from one place.  Voice
input might not be quite as powerful, but if users can leverage
near-full-power of apps on their phone, it could help transform mobile
interactions.

One example of this would be for an educational app, say a chiropractic
reference for students:
"Open ChiroApp and show me a diagram of the pelvis." (display image)
"Open ChiroApp and show quad stretches." (app plays video)
"Open ChiroApp and define Grostic procedure." (speak and display definition)
"Open ChiroApp to the middle of chapter three."

Other examples include:
Opening a particular book in an ebook reader to the place you left off
"Use Hello to call John." or "Call John using Hello."
"Start a Facebook post" (natural language parsing)

Apps could rely on locally stored content as well as internet-based data.
An app could also allow users to download larger datasets.
I don't know of any similar voice system that allows users to interact with
arbitrary apps in this manner.

Some apps occupy default phrases.  For instance "Wake me at noon." uses the
alarm app that I have set, "Take me home." opens my default navigation app
and "Define freedom" opens my preferred dictionary.

Mozilla could strongly support a set of killer apps which take advantage of
this plugability - as well as encourage broad support.

Because of its scope, it may benefit to open it up as a broad project with
strong encouragement of participation - with Mozilla creating the initial
structure and helping to guide development.  I expect there are compelling
issues that people would enjoy solving and potentially other OSes that
would contribute in order to adopt the system.

On the assistant side, we could curate recipes based on submissions and in
conjunction with (consented) automated feedback. This would be a great
opportunity for users with non-major locales to get a fuller experience.
That is, a native Catalan speaker could see a commonly-submitted, but
unsupported phrase and create the recipe for it (hopefully with limited
technical knowledge).

We will want, of course, to carefully consider how we grant and manage
permissions.  If it is mainly a WebAPI solution, much of this is taken care
of.

This system could also prove useful in devices other than phones such as
TVs, watches and even your laptop.  And perhaps Android integration.
Anywhere with Firefox and a microphone a user could leverage the assistant
as well as send commands to webapps.

>From my experience, Google & Apple's solutions have been dependent on
servers. Is this a technical necessity? Can we at least build a hybrid
solution?

Perhaps this is already in development or in discussions, but I was just
hoping to at least further the conversation and maybe add some ideas.
I'd also like to make my friends jealous when they see my phone doing
backflips (figuratively).  This could be a great differentiator for us.
What are your thoughts?


P.S. I see, after the fact, that with SpeechRTC we are bringing speech into
the platform (in a hybrid way too), but think that these ideas extend a bit
further than that.
_______________________________________________
dev-b2g mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-b2g

Reply via email to