SpellCheck API?

2011-05-09 Thread 坊野 博典
Greetings,

I'm Hironori Bono, a software engineer for Google Chrome.
We recently received requests from web-application developers (and
extension developers) that they would like to use the spellchecker
integrated into Google Chrome and to replace the spellchecker with
their spellcheckers implemented in JavaScript as written in the
following document. To satisfy their requests, I would like to propose
to add an API that controls spellcheckers integrated into a user agent
if it has. Even though I'm wondering if all user agents need this API,
it would be great to give me feedback.

Thank you for your interest in advance.

1. Introduction
HTML5 provides a spellcheck attribute to enable or disable the
spellcheckers integrated into user agents in an editable element. This
attribute prevents the spellcheckers from checking text in an editable
element where web applications do not like it, e.g. e-mail addresses,
URLs, etc. Some user agents provide scripting access to spellcheckers.
For example, Internet Explorer allows using the spellchecker
integrated into Microsoft Word via ActiveX as listed in the following
code snippet.

function CheckText(text) {
  var result = new Array;
  var app = new ActiveXObject('Word.Application');
  var doc = app.Documents.Add();
  doc.Content = text;
  for (var i = 1; i = doc.SpellingErrors.Count; i++) {
    var spellingError = doc.SpellingErrors.Item(i);
    for (var j = 1; j = spellingError.Words.Count; j++) {
  var word = spellingError.Words.Item(j);
  var error = {};
  error.word = word.Text;
  error.start = word.Start;
  error.length = word.Text.length;
  error.suggestions = new Array;
  var suggestions = word.GetSpellingSuggestions();
  for (var k = 1; k = suggestions.Count; k++) {
    error.suggestions.push(suggestions.Item(k).Name);
  }
  result.push(error);
    }
  }
  return result;
}

On the other hand, it is not so easy for web-application developers to
integrate custom spellcheckers (e.g. a spellchecker that uses a
contact list to check e-mail addresses, names, street addresses, etc.)
into their web applications. Even though several web applications
(such as GMails) have integrated custom spellcheckers, such web
applications use content-editable div elements to render misspelled
underlines and the ‘z-index’ properties to show suggestions,
respectively. Unfortunately, it is not so easy to apply these
techniques when web applications use textarea elements or input
elements for user input because it is pretty hard to identify the
position of misspelled words in these elements. To solve this problem,
it would be great for user agents to provide scripting access to their
spell-checker framework so web-application developers can integrate
their custom spellcheckers to their web applications as listed in the
following code snippet.

function CheckTextOfNode(node) {
  // Remove all the previous spellchecking results.
  window.spellCheckController.removeMarkers(node);

  // Check the text in the specified node.
  var result = CheckText(node.innerText ? node.innerText : node.value);
  for (var i = 0; i  result.length; i++) {
    // Add a misspelled underline and suggestions to the specified word.
    window.spellCheckController.addMarker(
    node, result[i].start, result[i].length, result[i].suggestions);
  }
}

This example adds two methods.
  * The window.spellCheckController.removeMarkers() method
    Removes the all misspelled underlines and suggestions in the specified node.
    The node parameter represents the DOM node in which a web
application like to remove all the misspelling underlines and
suggestions.
  * The window.spellCheckController.addMarker() method
    Attaches a misspelled underline and suggestions to the specified
range of a node.
    The node parameter represents a DOM node in which a user agent
adds a misspelled underline.
    The start and length parameters represent a range of text in the
DOM node specified by the node parameter. (We do not use a Range
object here because it is hard to specify a range of text in a
textarea element or an input element with it.)
    The suggestions parameter represents a list of words suggested by
the custom spellchecker. When a custom spellchecker does not provide
any suggestions, this parameter should be an empty list.

Even though these functions are sufficient for web-application
developers who use only their custom spellcheckers, they are not
sufficient for ones who use both their custom spellcheckers and the
one integrated to user agents. (For example, web applications that use
the integrated spellcheckers only for words which their custom
spellcheckers treat as misspelled.)

function CheckTextOfNode(node) {
  // Reset all the previous spellcheck results.
  Window.spellCheckController.removeMarkers(node);

  // Check the text with our custom spellchecker.
  var result = CheckText(node.innerText ? node.innerText : node.value);
  for (var i = 0; i  result.length; i++) 

Re: SpellCheck API?

2011-05-09 Thread Olli Pettay

On 05/09/2011 11:58 AM, Hironori Bono (坊野 博典) wrote:

Greetings,

I'm Hironori Bono, a software engineer for Google Chrome.
We recently received requests from web-application developers (and
extension developers) that they would like to use the spellchecker

Quite different targets.


integrated into Google Chrome and to replace the spellchecker with
their spellcheckers implemented in JavaScript as written in the
following document. To satisfy their requests, I would like to propose
to add an API that controls spellcheckers integrated into a user agent
if it has. Even though I'm wondering if all user agents need this API,
it would be great to give me feedback.

Thank you for your interest in advance.

1. Introduction
HTML5 provides a spellcheck attribute to enable or disable the
spellcheckers integrated into user agents in an editable element. This
attribute prevents the spellcheckers from checking text in an editable
element where web applications do not like it, e.g. e-mail addresses,
URLs, etc. Some user agents provide scripting access to spellcheckers.

Providing scripting access to built-in spellchecker is a privacy
violation (this has been discussed in @whatwg mailing list) -
web page could know which language users uses/has for spellchecking
and if user has added new word to the known-words list.




For example, Internet Explorer allows using the spellchecker
integrated into Microsoft Word via ActiveX as listed in the following
code snippet.

function CheckText(text) {
   var result = new Array;
   var app = new ActiveXObject('Word.Application');
   var doc = app.Documents.Add();
   doc.Content = text;
   for (var i = 1; i= doc.SpellingErrors.Count; i++) {
 var spellingError = doc.SpellingErrors.Item(i);
 for (var j = 1; j= spellingError.Words.Count; j++) {
   var word = spellingError.Words.Item(j);
   var error = {};
   error.word = word.Text;
   error.start = word.Start;
   error.length = word.Text.length;
   error.suggestions = new Array;
   var suggestions = word.GetSpellingSuggestions();
   for (var k = 1; k= suggestions.Count; k++) {
 error.suggestions.push(suggestions.Item(k).Name);
   }
   result.push(error);
 }
   }
   return result;
}

On the other hand, it is not so easy for web-application developers to
integrate custom spellcheckers (e.g. a spellchecker that uses a
contact list to check e-mail addresses, names, street addresses, etc.)
into their web applications. Even though several web applications
(such as GMails)

Oh, I didn't know that if I teach my browser's spellchecker to know
the words I use commonly, GMail can't handle that. Interesting.


 have integrated custom spellcheckers, such web

applications use content-editablediv  elements to render misspelled
underlines and the ‘z-index’ properties to show suggestions,
respectively. Unfortunately, it is not so easy to apply these
techniques when web applications usetextarea  elements orinput
elements for user input because it is pretty hard to identify the
position of misspelled words in these elements. To solve this problem,
it would be great for user agents to provide scripting access to their
spell-checker framework so web-application developers can integrate
their custom spellcheckers

Adding support for custom spellcheckers seems reasonable.
Need to just make sure that web page doesn't get access to the native
spellcheck data (at least not without permission).



to their web applications as listed in the
following code snippet.

function CheckTextOfNode(node) {
   // Remove all the previous spellchecking results.
   window.spellCheckController.removeMarkers(node);

   // Check the text in the specified node.
   var result = CheckText(node.innerText ? node.innerText : node.value);
   for (var i = 0; i  result.length; i++) {
 // Add a misspelled underline and suggestions to the specified word.
 window.spellCheckController.addMarker(
 node, result[i].start, result[i].length, result[i].suggestions);
   }
}

This example adds two methods.
   * The window.spellCheckController.removeMarkers() method
 Removes the all misspelled underlines and suggestions in the specified 
node.
 The node parameter represents the DOM node in which a web
application like to remove all the misspelling underlines and
suggestions.
   * The window.spellCheckController.addMarker() method
 Attaches a misspelled underline and suggestions to the specified
range of a node.
 The node parameter represents a DOM node in which a user agent
adds a misspelled underline.
 The start and length parameters represent a range of text in the
DOM node specified by the node parameter. (We do not use a Range
object here because it is hard to specify a range of text in a
textarea  element or aninput  element with it.)
 The suggestions parameter represents a list of words suggested by
the custom spellchecker. When a custom spellchecker does not provide
any suggestions, this 

Re: SpellCheck API?

2011-05-09 Thread Oliver Hunt
 
 This is the privacy violation, and not acceptable as such.
 I wonder how to not expose native spellchecker data to web page, yet
 support this use case. Or do we need yet another permission, which user
 has to give to the page before the spellchecker API fully working.
 

In general permission dialogs are not real solutions to privacy problems -- 
when a user sees a dialog their mindset is not what is this dialog asking? 
but how do i make this dialog go away?.

This same problem exists with the geolocation apis and others, but adding more 
examples of this doesn't seem like future proof plan.

--Oliver



Re: [WebIDL][Selectors-API] Stringifying null for DOMString attributes and properties

2011-05-09 Thread Lachlan Hunt

On 2011-05-07 16:03, Lachlan Hunt wrote:

(I don't have results for IE yet because the testharness script I used
to write the tests doesn't work in IE.)


I've now tested IE9, which did give me results.  The following 
properties are all stringified to .


* BODY .text, .bgColor, .link, .vLink, .aLink
* HR .size
* A, AREA .coords, .shape
* IFRAME .width, .height, .marginHeight, .marginWidth
* TABLE .bgColor, .width
* COLGROUP, COL .width
* TR .bgColor,
* TD, TH .bgColor, .height, .width
* INPUT .height, .width
* BUTTON .type
* .textContent on everything

Every other tested property on HTML*Element interfaces stringified to 
null.


(This is testing non-readonly DOMString properties on HTML*Element 
interfaces documented in HTML [1], excluding those I mentioned as 
untested in my previous mail.)


[1] http://www.whatwg.org/C

--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/



Re: [IndexedDB] Closing on bug 9903 (collations)

2011-05-09 Thread Shawn Wilsher

On 5/6/2011 7:07 AM, timeless wrote:

I think that a stored procedure could be considered as a compiled
version of a serialized function. i.e. something which loses its scope
chain, and which loses access to its parent object. If it loses access
to its scope chain which includes the interesting globals, it will no
longer have access to fun things like DOM objects, roughly like
DOMWorkers but with even less exciting objects available. I'd hope
that a jit should be able to do a fairly reasonable job of optimizing
such a function given these constraints.

This may be what we go with, but not in version 1.

Cheers,

Shawn



smime.p7s
Description: S/MIME Cryptographic Signature


Re: SpellCheck API?

2011-05-09 Thread Aryeh Gregor
2011/5/9 Hironori Bono (坊野 博典) hb...@google.com:
 function CheckTextOfNode(node) {
   // Remove all the previous spellchecking results.
   window.spellCheckController.removeMarkers(node);

   // Check the text in the specified node.
   var result = CheckText(node.innerText ? node.innerText : node.value);
  for (var i = 0; i  result.length; i++) {
    // Add a misspelled underline and suggestions to the specified word.
     window.spellCheckController.addMarker(
     node, result[i].start, result[i].length, result[i].suggestions);
   }
 }

 . . .

 function CheckTextOfNode(node) {
  // Reset all the previous spellcheck results.
  Window.spellCheckController.removeMarkers(node);

   // Check the text with our custom spellchecker.
   var result = CheckText(node.innerText ? node.innerText : node.value);
  for (var i = 0; i  result.length; i++) {
    // Use the intergrated spellchecker to check a misspelled word.
     if (!window.spellCheckController.checkWord(result.word)) {
  result[i].suggestions.concat(
   window.spellCheckController.getSuggestionsForWord(result.text));
   window.spellCheckController.addMarker(
       node, result[i].start, result[i].length, result[i].suggestions);
     }
   }
 }

It would be much simpler for authors if the UA just fired an event
every time it did a spellcheck.  The event might work like this:

* Every time the UA would normally invoke its spellchecker on a word,
it fires a spellcheck event at the element in question, which bubbles
(so authors can set a handler on the body if they like).  This has to
occur when a spellcheckable element first loads, if an element becomes
spellcheckable when it wasn't before, or whenever the user modifies a
spellcheckable element such that the spellchecker would normally fire
(e.g., when they finish typing a word).
* The event object should provide the text of the word whose spelling
needs to be checked.  It should give the node and start/end offsets,
either of the input/textarea or the text node.  (Not sure what should
happen for a misspelled word that's not all in one text node.)
* The event object should have a member variable that the script can
assign a list of suggestions to, and other members to specify what
behavior the script wants: mark the word as spelled correctly, mark it
as misspelled with only the provided suggestions, mark it as
misspelled with the provided suggestions plus the UA's suggestions, or
let the UA make the decision.
* Nothing should expose what the built-in spellchecker's decision was,
neither whether it was misspelled nor the list of suggestions.
* If authors want to re-spellcheck something that's already been
spellchecked, no special function is needed.  They should set
spellcheck=false on the element and then restore spellcheck=true.

This means authors wouldn't have to do word-breaking themselves, which
is a big advantage, since word-breaking can be very complicated.  It
would be *much* simpler to just plug in a spell-checker, without
having to write a lot of scaffolding code to track what text the user
is entering.  The only downside I can see is that it does force you to
use the browser's word-breaking behavior, but I don't think that's a
big disadvantage compared to the advantages.

Here's some sample code for how to do roughly the same thing as your
sample code:

function CheckTextOfNode(node) {
  node.onspellcheck = function(event) {
// Set suggestions
event.suggestions = MySpellCheckFunction(event.word);
// If there were no suggestions, the word is spelled right.  I
want to totally ignore the built-in spellchecker.
// By default, event.misspelled might be null, which means mark
it as misspelled only if the built-in
// spellchecker thinks it's misspelled.  True or false overrides
the built-in spellchecker.
event.misspelled = event.suggestions.length  0;
// I don't want the built-in suggestions to combine with mine.
event.combineSuggestions = false;
  }
  node.spellcheck = false;
  node.spellcheck = true;
}

But this doesn't even capture how much simpler it is, because your
CheckText() function is what would have to contain all the
word-breaking logic, which isn't needed here.  MySpellCheckFunction()
just needs to take a word as input and return a list of suggestions as
output, that's it.



Re: SpellCheck API?

2011-05-09 Thread Boris Zbarsky

On 5/9/11 3:39 PM, Aryeh Gregor wrote:

* Every time the UA would normally invoke its spellchecker on a word,
it fires a spellcheck event at the element in question


This does mean firing tens of thousands of events during load on some 
pages (e.g. wikipedia article edit pages)  Maybe that's not a big deal.


-Boris



Re: [WebIDL][Selectors-API] Stringifying null for DOMString attributes and properties

2011-05-09 Thread Jonas Sicking
On Mon, May 9, 2011 at 9:22 AM, Lachlan Hunt lachlan.h...@lachy.id.au wrote:
 On 2011-05-07 16:03, Lachlan Hunt wrote:

 (I don't have results for IE yet because the testharness script I used
 to write the tests doesn't work in IE.)

 I've now tested IE9, which did give me results.  The following properties
 are all stringified to .

 * BODY .text, .bgColor, .link, .vLink, .aLink
 * HR .size
 * A, AREA .coords, .shape
 * IFRAME .width, .height, .marginHeight, .marginWidth
 * TABLE .bgColor, .width
 * COLGROUP, COL .width
 * TR .bgColor,
 * TD, TH .bgColor, .height, .width
 * INPUT .height, .width
 * BUTTON .type
 * .textContent on everything

 Every other tested property on HTML*Element interfaces stringified to
 null.

What about namespaceURI, in various APIs (DOM-Core, DOM-XPath).

In general, my main priority is that we make things as consistent as
possible. My second priority is that we make things follow JS
behavior. So I'd be very happy if we can get away with making the just
the above list stringify to , and the rest of the DOM stringify to
null.

/ Jonas



Re: clipboard events

2011-05-09 Thread Ian Hickson

(Sorry for the long delay in responding to this.)

On Wed, 26 Jan 2011, Hallvord R. M. Steen wrote:
 On Fri, 24 Dec 2010 07:21:35 +0900, Paul Libbrecht p...@hoplahup.net wrote:
 
  - this seems to support the insertion in the clipboard's data of other 
  types than what is currently commonly supported by browsers and the 
  minimum quoted there; this is good and important. I think, for 
  example, that such data as the iCal format would fit very well and be 
  very useful here.
 
 It intends to, but this has two open issues: * I assume that many OS 
 clipboard implementations have an enumerated list of known formats, 
 I'm not sure if all OSes can handle a request to push text/foobar data 
 to the clipboard. Does anyone know if we can rely on such functionality 
 being truly cross-platform?
 
 * There is not yet a clear way to push multi-part or alternate formats 
 to the OS clipboard from JS. To use something like iCal, I guess best 
 practise would be to push one human-readable text/plain string for 
 target software without iCal support, and one alternate entry in iCal 
 format. I guess that can be done with
 
 DataTransferItem add(in DOMString data, in DOMString type);
 
 I.e. spec for copy event would be

 * default action: copy any document selection

 * if default action is prevented: push data in drag data store (as 
 manipulated by setData() or items.add()) to clipboard, probably mapping 
 certain known values to native clipboard formats while doing so.
 
 Ian - would that make sense?

I think you'd want to push the script-added data regardless of whether the 
event is canceled or not. Why would the script add the data otherwise?

I would just model the 'copy' (and 'cut') events exactly as a 'dragstart' 
event, ideally so much so that you can literally use the same function for 
both. (Canceling 'cut' would prevent the default deletion of the 
selection, canceling 'copy' has no effect.)


 However, what about items.add() called during a paste event listener?
 Currently I do not allow paste event listeners to update the clipboard with
 setData(), it seems strange. Should we just disallow this too?

I'd model 'paste' on 'drop', including having the drag data store in 
read-only mode, yes.


 On Fri, 07 Jan 2011 04:31:01 +0900, Ian Hickson i...@hixie.ch wrote:
 
  Is it intended to also cover cut, copy and paste? The current spec 
  draft seems very vague about when the events fire and what their 
  default actions are, but I can't tell if that's intentional or not.
 
 Better now?

Not really. There's no processing model. IMHO we need a list of steps 
somewhere that defines how the events fire with respect to the event loop, 
which task source is used, what mode the drag data store is in, what the 
default actions are, how they interact with mutation events, etc.

Basically, the cut/copy/paste equivalent of:

http://www.whatwg.org/specs/web-apps/current-work/complete.html#drag-and-drop-processing-model


 On Sat, 08 Jan 2011 05:02:02 +0900, Ian Hickson i...@hixie.ch wrote:
 
Is it intended to also cover cut, copy and paste?
   
   Sorry, I don't understand the question.
  
  Well, for example, the 'cut' operation involves removing or mutating DOM
  nodes (for contentEditable) or editing the control value (for input) or
  raw value (for textarea), and modifying the selection accordingly.
 
 The timing is in scope, how to do the actual modifications is not (i.e. 
 I'm not trying to decide how the implementation should figure out what 
 DOM nodes to remove when the user selects something and cuts.)

We probably should define that.


  Sure. For example, when should the paste event fire relative to when 
  keydown/keyup events fire? When should the paste event fire relative 
  to when the 'input' event fires?
 
 Fixed.

Not really. It says it's asynchronous, but what event source should it 
use? Should it be queued on the event loop, or should it fire at some 
other step? Should it fire as part of the same task that mutates the DOM? 
Should it fire as part of the same task that fires the keydown event, or 
some different task? This would define, e.g., the relative order of tasks 
queued during handlers of the keydown event and those queued during the 
cut/copy/paste events.


  Where do these events fire relative to mutation events, for 
  contentEditable? What should the default action of 'paste' be, in 
  terms of DOM mutations when the cursor is in a contentEditable 
  section?
 
 It is fixed to the extent that I consider these things in scope.. I'm 
 trying to keep this spec short and sweet :)

I think it'd be better to keep it precise. :-)


  Assuming RFC2119 semantics, the spec is lacking detail. For example, 
  nothing normatively says whether the events bubble or not, it's just 
  left up to the reader to assume that the table implies that it does.
 
 I've tried to sharpen the use of must, must not and the like. I'm 
 not entirely sure why 'Bubbles: yes' in a table isn't normative or 
 

Re: clipboard events

2011-05-09 Thread João Eiras



I think you'd want to push the script-added data regardless of whether the
event is canceled or not. Why would the script add the data otherwise?

I would just model the 'copy' (and 'cut') events exactly as a 'dragstart'
event, ideally so much so that you can literally use the same function for
both. (Canceling 'cut' would prevent the default deletion of the
selection, canceling 'copy' has no effect.)



Shouldn't canceling 'copy' prevent the data from being placed in the clipboard 
? That way a script can instead explicitly set the contents of the clipboard, 
if some sanitization needs to be done.



Re: SpellCheck API?

2011-05-09 Thread Aryeh Gregor
On Mon, May 9, 2011 at 3:49 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 This does mean firing tens of thousands of events during load on some pages
 (e.g. wikipedia article edit pages)  Maybe that's not a big deal.

If that's too many events, couldn't the browser optimize by not
spellchecking words until they scroll into view?  I imagine that might
not be terribly simple, depending on how the browser is designed, but
maybe tens of thousands of events aren't too expensive anyway.  I
don't know, up to implementers whether it's doable.

I'm assuming here that there's effectively no cost if no one's
registered a spellcheck handler, so it won't penalize authors who
don't use the feature.