SpellCheck API?
Greetings, I'm Hironori Bono, a software engineer for Google Chrome. We recently received requests from web-application developers (and extension developers) that they would like to use the spellchecker integrated into Google Chrome and to replace the spellchecker with their spellcheckers implemented in JavaScript as written in the following document. To satisfy their requests, I would like to propose to add an API that controls spellcheckers integrated into a user agent if it has. Even though I'm wondering if all user agents need this API, it would be great to give me feedback. Thank you for your interest in advance. 1. Introduction HTML5 provides a spellcheck attribute to enable or disable the spellcheckers integrated into user agents in an editable element. This attribute prevents the spellcheckers from checking text in an editable element where web applications do not like it, e.g. e-mail addresses, URLs, etc. Some user agents provide scripting access to spellcheckers. For example, Internet Explorer allows using the spellchecker integrated into Microsoft Word via ActiveX as listed in the following code snippet. function CheckText(text) { var result = new Array; var app = new ActiveXObject('Word.Application'); var doc = app.Documents.Add(); doc.Content = text; for (var i = 1; i = doc.SpellingErrors.Count; i++) { var spellingError = doc.SpellingErrors.Item(i); for (var j = 1; j = spellingError.Words.Count; j++) { var word = spellingError.Words.Item(j); var error = {}; error.word = word.Text; error.start = word.Start; error.length = word.Text.length; error.suggestions = new Array; var suggestions = word.GetSpellingSuggestions(); for (var k = 1; k = suggestions.Count; k++) { error.suggestions.push(suggestions.Item(k).Name); } result.push(error); } } return result; } On the other hand, it is not so easy for web-application developers to integrate custom spellcheckers (e.g. a spellchecker that uses a contact list to check e-mail addresses, names, street addresses, etc.) into their web applications. Even though several web applications (such as GMails) have integrated custom spellcheckers, such web applications use content-editable div elements to render misspelled underlines and the ‘z-index’ properties to show suggestions, respectively. Unfortunately, it is not so easy to apply these techniques when web applications use textarea elements or input elements for user input because it is pretty hard to identify the position of misspelled words in these elements. To solve this problem, it would be great for user agents to provide scripting access to their spell-checker framework so web-application developers can integrate their custom spellcheckers to their web applications as listed in the following code snippet. function CheckTextOfNode(node) { // Remove all the previous spellchecking results. window.spellCheckController.removeMarkers(node); // Check the text in the specified node. var result = CheckText(node.innerText ? node.innerText : node.value); for (var i = 0; i result.length; i++) { // Add a misspelled underline and suggestions to the specified word. window.spellCheckController.addMarker( node, result[i].start, result[i].length, result[i].suggestions); } } This example adds two methods. * The window.spellCheckController.removeMarkers() method Removes the all misspelled underlines and suggestions in the specified node. The node parameter represents the DOM node in which a web application like to remove all the misspelling underlines and suggestions. * The window.spellCheckController.addMarker() method Attaches a misspelled underline and suggestions to the specified range of a node. The node parameter represents a DOM node in which a user agent adds a misspelled underline. The start and length parameters represent a range of text in the DOM node specified by the node parameter. (We do not use a Range object here because it is hard to specify a range of text in a textarea element or an input element with it.) The suggestions parameter represents a list of words suggested by the custom spellchecker. When a custom spellchecker does not provide any suggestions, this parameter should be an empty list. Even though these functions are sufficient for web-application developers who use only their custom spellcheckers, they are not sufficient for ones who use both their custom spellcheckers and the one integrated to user agents. (For example, web applications that use the integrated spellcheckers only for words which their custom spellcheckers treat as misspelled.) function CheckTextOfNode(node) { // Reset all the previous spellcheck results. Window.spellCheckController.removeMarkers(node); // Check the text with our custom spellchecker. var result = CheckText(node.innerText ? node.innerText : node.value); for (var i = 0; i result.length; i++)
Re: SpellCheck API?
On 05/09/2011 11:58 AM, Hironori Bono (坊野 博典) wrote: Greetings, I'm Hironori Bono, a software engineer for Google Chrome. We recently received requests from web-application developers (and extension developers) that they would like to use the spellchecker Quite different targets. integrated into Google Chrome and to replace the spellchecker with their spellcheckers implemented in JavaScript as written in the following document. To satisfy their requests, I would like to propose to add an API that controls spellcheckers integrated into a user agent if it has. Even though I'm wondering if all user agents need this API, it would be great to give me feedback. Thank you for your interest in advance. 1. Introduction HTML5 provides a spellcheck attribute to enable or disable the spellcheckers integrated into user agents in an editable element. This attribute prevents the spellcheckers from checking text in an editable element where web applications do not like it, e.g. e-mail addresses, URLs, etc. Some user agents provide scripting access to spellcheckers. Providing scripting access to built-in spellchecker is a privacy violation (this has been discussed in @whatwg mailing list) - web page could know which language users uses/has for spellchecking and if user has added new word to the known-words list. For example, Internet Explorer allows using the spellchecker integrated into Microsoft Word via ActiveX as listed in the following code snippet. function CheckText(text) { var result = new Array; var app = new ActiveXObject('Word.Application'); var doc = app.Documents.Add(); doc.Content = text; for (var i = 1; i= doc.SpellingErrors.Count; i++) { var spellingError = doc.SpellingErrors.Item(i); for (var j = 1; j= spellingError.Words.Count; j++) { var word = spellingError.Words.Item(j); var error = {}; error.word = word.Text; error.start = word.Start; error.length = word.Text.length; error.suggestions = new Array; var suggestions = word.GetSpellingSuggestions(); for (var k = 1; k= suggestions.Count; k++) { error.suggestions.push(suggestions.Item(k).Name); } result.push(error); } } return result; } On the other hand, it is not so easy for web-application developers to integrate custom spellcheckers (e.g. a spellchecker that uses a contact list to check e-mail addresses, names, street addresses, etc.) into their web applications. Even though several web applications (such as GMails) Oh, I didn't know that if I teach my browser's spellchecker to know the words I use commonly, GMail can't handle that. Interesting. have integrated custom spellcheckers, such web applications use content-editablediv elements to render misspelled underlines and the ‘z-index’ properties to show suggestions, respectively. Unfortunately, it is not so easy to apply these techniques when web applications usetextarea elements orinput elements for user input because it is pretty hard to identify the position of misspelled words in these elements. To solve this problem, it would be great for user agents to provide scripting access to their spell-checker framework so web-application developers can integrate their custom spellcheckers Adding support for custom spellcheckers seems reasonable. Need to just make sure that web page doesn't get access to the native spellcheck data (at least not without permission). to their web applications as listed in the following code snippet. function CheckTextOfNode(node) { // Remove all the previous spellchecking results. window.spellCheckController.removeMarkers(node); // Check the text in the specified node. var result = CheckText(node.innerText ? node.innerText : node.value); for (var i = 0; i result.length; i++) { // Add a misspelled underline and suggestions to the specified word. window.spellCheckController.addMarker( node, result[i].start, result[i].length, result[i].suggestions); } } This example adds two methods. * The window.spellCheckController.removeMarkers() method Removes the all misspelled underlines and suggestions in the specified node. The node parameter represents the DOM node in which a web application like to remove all the misspelling underlines and suggestions. * The window.spellCheckController.addMarker() method Attaches a misspelled underline and suggestions to the specified range of a node. The node parameter represents a DOM node in which a user agent adds a misspelled underline. The start and length parameters represent a range of text in the DOM node specified by the node parameter. (We do not use a Range object here because it is hard to specify a range of text in a textarea element or aninput element with it.) The suggestions parameter represents a list of words suggested by the custom spellchecker. When a custom spellchecker does not provide any suggestions, this
Re: SpellCheck API?
This is the privacy violation, and not acceptable as such. I wonder how to not expose native spellchecker data to web page, yet support this use case. Or do we need yet another permission, which user has to give to the page before the spellchecker API fully working. In general permission dialogs are not real solutions to privacy problems -- when a user sees a dialog their mindset is not what is this dialog asking? but how do i make this dialog go away?. This same problem exists with the geolocation apis and others, but adding more examples of this doesn't seem like future proof plan. --Oliver
Re: [WebIDL][Selectors-API] Stringifying null for DOMString attributes and properties
On 2011-05-07 16:03, Lachlan Hunt wrote: (I don't have results for IE yet because the testharness script I used to write the tests doesn't work in IE.) I've now tested IE9, which did give me results. The following properties are all stringified to . * BODY .text, .bgColor, .link, .vLink, .aLink * HR .size * A, AREA .coords, .shape * IFRAME .width, .height, .marginHeight, .marginWidth * TABLE .bgColor, .width * COLGROUP, COL .width * TR .bgColor, * TD, TH .bgColor, .height, .width * INPUT .height, .width * BUTTON .type * .textContent on everything Every other tested property on HTML*Element interfaces stringified to null. (This is testing non-readonly DOMString properties on HTML*Element interfaces documented in HTML [1], excluding those I mentioned as untested in my previous mail.) [1] http://www.whatwg.org/C -- Lachlan Hunt - Opera Software http://lachy.id.au/ http://www.opera.com/
Re: [IndexedDB] Closing on bug 9903 (collations)
On 5/6/2011 7:07 AM, timeless wrote: I think that a stored procedure could be considered as a compiled version of a serialized function. i.e. something which loses its scope chain, and which loses access to its parent object. If it loses access to its scope chain which includes the interesting globals, it will no longer have access to fun things like DOM objects, roughly like DOMWorkers but with even less exciting objects available. I'd hope that a jit should be able to do a fairly reasonable job of optimizing such a function given these constraints. This may be what we go with, but not in version 1. Cheers, Shawn smime.p7s Description: S/MIME Cryptographic Signature
Re: SpellCheck API?
2011/5/9 Hironori Bono (坊野 博典) hb...@google.com: function CheckTextOfNode(node) { // Remove all the previous spellchecking results. window.spellCheckController.removeMarkers(node); // Check the text in the specified node. var result = CheckText(node.innerText ? node.innerText : node.value); for (var i = 0; i result.length; i++) { // Add a misspelled underline and suggestions to the specified word. window.spellCheckController.addMarker( node, result[i].start, result[i].length, result[i].suggestions); } } . . . function CheckTextOfNode(node) { // Reset all the previous spellcheck results. Window.spellCheckController.removeMarkers(node); // Check the text with our custom spellchecker. var result = CheckText(node.innerText ? node.innerText : node.value); for (var i = 0; i result.length; i++) { // Use the intergrated spellchecker to check a misspelled word. if (!window.spellCheckController.checkWord(result.word)) { result[i].suggestions.concat( window.spellCheckController.getSuggestionsForWord(result.text)); window.spellCheckController.addMarker( node, result[i].start, result[i].length, result[i].suggestions); } } } It would be much simpler for authors if the UA just fired an event every time it did a spellcheck. The event might work like this: * Every time the UA would normally invoke its spellchecker on a word, it fires a spellcheck event at the element in question, which bubbles (so authors can set a handler on the body if they like). This has to occur when a spellcheckable element first loads, if an element becomes spellcheckable when it wasn't before, or whenever the user modifies a spellcheckable element such that the spellchecker would normally fire (e.g., when they finish typing a word). * The event object should provide the text of the word whose spelling needs to be checked. It should give the node and start/end offsets, either of the input/textarea or the text node. (Not sure what should happen for a misspelled word that's not all in one text node.) * The event object should have a member variable that the script can assign a list of suggestions to, and other members to specify what behavior the script wants: mark the word as spelled correctly, mark it as misspelled with only the provided suggestions, mark it as misspelled with the provided suggestions plus the UA's suggestions, or let the UA make the decision. * Nothing should expose what the built-in spellchecker's decision was, neither whether it was misspelled nor the list of suggestions. * If authors want to re-spellcheck something that's already been spellchecked, no special function is needed. They should set spellcheck=false on the element and then restore spellcheck=true. This means authors wouldn't have to do word-breaking themselves, which is a big advantage, since word-breaking can be very complicated. It would be *much* simpler to just plug in a spell-checker, without having to write a lot of scaffolding code to track what text the user is entering. The only downside I can see is that it does force you to use the browser's word-breaking behavior, but I don't think that's a big disadvantage compared to the advantages. Here's some sample code for how to do roughly the same thing as your sample code: function CheckTextOfNode(node) { node.onspellcheck = function(event) { // Set suggestions event.suggestions = MySpellCheckFunction(event.word); // If there were no suggestions, the word is spelled right. I want to totally ignore the built-in spellchecker. // By default, event.misspelled might be null, which means mark it as misspelled only if the built-in // spellchecker thinks it's misspelled. True or false overrides the built-in spellchecker. event.misspelled = event.suggestions.length 0; // I don't want the built-in suggestions to combine with mine. event.combineSuggestions = false; } node.spellcheck = false; node.spellcheck = true; } But this doesn't even capture how much simpler it is, because your CheckText() function is what would have to contain all the word-breaking logic, which isn't needed here. MySpellCheckFunction() just needs to take a word as input and return a list of suggestions as output, that's it.
Re: SpellCheck API?
On 5/9/11 3:39 PM, Aryeh Gregor wrote: * Every time the UA would normally invoke its spellchecker on a word, it fires a spellcheck event at the element in question This does mean firing tens of thousands of events during load on some pages (e.g. wikipedia article edit pages) Maybe that's not a big deal. -Boris
Re: [WebIDL][Selectors-API] Stringifying null for DOMString attributes and properties
On Mon, May 9, 2011 at 9:22 AM, Lachlan Hunt lachlan.h...@lachy.id.au wrote: On 2011-05-07 16:03, Lachlan Hunt wrote: (I don't have results for IE yet because the testharness script I used to write the tests doesn't work in IE.) I've now tested IE9, which did give me results. The following properties are all stringified to . * BODY .text, .bgColor, .link, .vLink, .aLink * HR .size * A, AREA .coords, .shape * IFRAME .width, .height, .marginHeight, .marginWidth * TABLE .bgColor, .width * COLGROUP, COL .width * TR .bgColor, * TD, TH .bgColor, .height, .width * INPUT .height, .width * BUTTON .type * .textContent on everything Every other tested property on HTML*Element interfaces stringified to null. What about namespaceURI, in various APIs (DOM-Core, DOM-XPath). In general, my main priority is that we make things as consistent as possible. My second priority is that we make things follow JS behavior. So I'd be very happy if we can get away with making the just the above list stringify to , and the rest of the DOM stringify to null. / Jonas
Re: clipboard events
(Sorry for the long delay in responding to this.) On Wed, 26 Jan 2011, Hallvord R. M. Steen wrote: On Fri, 24 Dec 2010 07:21:35 +0900, Paul Libbrecht p...@hoplahup.net wrote: - this seems to support the insertion in the clipboard's data of other types than what is currently commonly supported by browsers and the minimum quoted there; this is good and important. I think, for example, that such data as the iCal format would fit very well and be very useful here. It intends to, but this has two open issues: * I assume that many OS clipboard implementations have an enumerated list of known formats, I'm not sure if all OSes can handle a request to push text/foobar data to the clipboard. Does anyone know if we can rely on such functionality being truly cross-platform? * There is not yet a clear way to push multi-part or alternate formats to the OS clipboard from JS. To use something like iCal, I guess best practise would be to push one human-readable text/plain string for target software without iCal support, and one alternate entry in iCal format. I guess that can be done with DataTransferItem add(in DOMString data, in DOMString type); I.e. spec for copy event would be * default action: copy any document selection * if default action is prevented: push data in drag data store (as manipulated by setData() or items.add()) to clipboard, probably mapping certain known values to native clipboard formats while doing so. Ian - would that make sense? I think you'd want to push the script-added data regardless of whether the event is canceled or not. Why would the script add the data otherwise? I would just model the 'copy' (and 'cut') events exactly as a 'dragstart' event, ideally so much so that you can literally use the same function for both. (Canceling 'cut' would prevent the default deletion of the selection, canceling 'copy' has no effect.) However, what about items.add() called during a paste event listener? Currently I do not allow paste event listeners to update the clipboard with setData(), it seems strange. Should we just disallow this too? I'd model 'paste' on 'drop', including having the drag data store in read-only mode, yes. On Fri, 07 Jan 2011 04:31:01 +0900, Ian Hickson i...@hixie.ch wrote: Is it intended to also cover cut, copy and paste? The current spec draft seems very vague about when the events fire and what their default actions are, but I can't tell if that's intentional or not. Better now? Not really. There's no processing model. IMHO we need a list of steps somewhere that defines how the events fire with respect to the event loop, which task source is used, what mode the drag data store is in, what the default actions are, how they interact with mutation events, etc. Basically, the cut/copy/paste equivalent of: http://www.whatwg.org/specs/web-apps/current-work/complete.html#drag-and-drop-processing-model On Sat, 08 Jan 2011 05:02:02 +0900, Ian Hickson i...@hixie.ch wrote: Is it intended to also cover cut, copy and paste? Sorry, I don't understand the question. Well, for example, the 'cut' operation involves removing or mutating DOM nodes (for contentEditable) or editing the control value (for input) or raw value (for textarea), and modifying the selection accordingly. The timing is in scope, how to do the actual modifications is not (i.e. I'm not trying to decide how the implementation should figure out what DOM nodes to remove when the user selects something and cuts.) We probably should define that. Sure. For example, when should the paste event fire relative to when keydown/keyup events fire? When should the paste event fire relative to when the 'input' event fires? Fixed. Not really. It says it's asynchronous, but what event source should it use? Should it be queued on the event loop, or should it fire at some other step? Should it fire as part of the same task that mutates the DOM? Should it fire as part of the same task that fires the keydown event, or some different task? This would define, e.g., the relative order of tasks queued during handlers of the keydown event and those queued during the cut/copy/paste events. Where do these events fire relative to mutation events, for contentEditable? What should the default action of 'paste' be, in terms of DOM mutations when the cursor is in a contentEditable section? It is fixed to the extent that I consider these things in scope.. I'm trying to keep this spec short and sweet :) I think it'd be better to keep it precise. :-) Assuming RFC2119 semantics, the spec is lacking detail. For example, nothing normatively says whether the events bubble or not, it's just left up to the reader to assume that the table implies that it does. I've tried to sharpen the use of must, must not and the like. I'm not entirely sure why 'Bubbles: yes' in a table isn't normative or
Re: clipboard events
I think you'd want to push the script-added data regardless of whether the event is canceled or not. Why would the script add the data otherwise? I would just model the 'copy' (and 'cut') events exactly as a 'dragstart' event, ideally so much so that you can literally use the same function for both. (Canceling 'cut' would prevent the default deletion of the selection, canceling 'copy' has no effect.) Shouldn't canceling 'copy' prevent the data from being placed in the clipboard ? That way a script can instead explicitly set the contents of the clipboard, if some sanitization needs to be done.
Re: SpellCheck API?
On Mon, May 9, 2011 at 3:49 PM, Boris Zbarsky bzbar...@mit.edu wrote: This does mean firing tens of thousands of events during load on some pages (e.g. wikipedia article edit pages) Maybe that's not a big deal. If that's too many events, couldn't the browser optimize by not spellchecking words until they scroll into view? I imagine that might not be terribly simple, depending on how the browser is designed, but maybe tens of thousands of events aren't too expensive anyway. I don't know, up to implementers whether it's doable. I'm assuming here that there's effectively no cost if no one's registered a spellcheck handler, so it won't penalize authors who don't use the feature.