[IndexedDB] Detailed comments for the current draft

Pablo Castro Tue, 26 Jan 2010 12:48:24 -0800

These are notes that we collected both from reviewing the spec (editor's draft 
up to Jan 24th) and from a prototype implementation that we are working on. I 
didn't realize we had this many notes, otherwise I would have been sending 
intermediate notes early. Will do so next round.



1. Keys and sorting

a.       3.1.1:  it would seem that having also date/time values as keys would 
be important and it's a common sorting criteria (e.g. as part of a composite 
primary key or in general as an index key).
b.      3.1.1: similarly, sorting on number in general (not just 
integers/longs) would be important (e.g. price lists, scores, etc.)
c.       3.1.1: cross type sorting and sorting of long values are clear. 
Sorting of strings however needs more elaboration. In particular, which 
collation do we use? Does the user or developer get to choose a collation? If 
we pick up a collation from the environment (e.g. the OS), if the collation 
changes we'd have to re-index all the databases.
d.      3.1.3: spec reads "…key path must be the name of an enumerated 
property…"; how about composite keys (would make the related APIs take a 
DOMString or DOMStringList) 


2. Values

a.       3.1.2: isn't the requirement for "structured clones" too much? It 
would mean implementations would have to be able to store and retrieve File 
objects and such. Would it be more appropriate to say it's just graphs of 
Javascript primitive objects/values (object, string, number, date, arrays, 
null)? 


3. Object store

a.       3.1.3: do we really need in-line + out-of-line keys? Besides the 
concept-count increase, we wonder whether out-of-line keys would cause trouble 
to generic libraries, as the values for the keys wouldn't be part of the values 
iterated when doing a "foreach" over the table.
b.      Query processing libraries will need temporary stores, which need 
temporary names. Should we introduce an API for the creation of temporary 
stores with transaction lifetime and no name?
c.      It would be nice to have an estimate row count on each store. This 
comes at an implementation and runtime cost. Strong opinions? Lacking 
everything else, this would be the only statistic to base decisions on for a 
query processor. 
d.      The draft does not touch on how applications would do optimistic 
concurrency. A common way of doing this is to use a timestamp value that's 
automatically updated by the system every time someone touches the row. While 
we don't feel it's a must have, it certainly supports common scenarios.


4. Indexes

a.       3.1.4 mentions "auto-populated" indexes, but then there is no mention 
of other types. We suggest that we remove this and in the algorithms section 
describe side-effecting operations as always updating the indexes as well.
b.      If during insert/update the value of the key is not present (i.e. 
undefined as opposite to null or a value), is that a failure, does the row not 
get indexed, or is it indexed as null? Failure would probably cause a lot of 
trouble to users; the other two have correctness problems. An option is to 
index them as undefined, but now we have undefined and null as indexable keys. 
We lean toward this last option. 
5.       Databases
a.       Not being able to enumerate database gets in the way of creating good 
tools and frameworks such as database explorers. What was the motivation for 
this? Is it security related?
b.      Clarification on transactions: all database operations that affect the 
schema (create/remove store/index, setVersion, etc.) as well as data 
modification operations are assumed to be auto-commit by default, correct? 
Furthermore, all those operations (both schema and data) can happen within a 
transaction, including mixing schema and data changes. Does that line up with 
others' expectations? If so we should find a spot to articulate this explicitly.
c.       No way to delete a database? It would be reasonable for applications 
to want to do that and let go of the user data (e.g. a "forget me" feature in a 
web site)
6.       Transactions
a.       While we understand the goal of simplifying developers' life with an 
error-free transactional model, we're not sure if we're making more harm by 
introducing more concepts into this space. Wouldn't it be better to use regular 
transactions with a well-known failure mode (e.g. either deadlocks or 
optimistic concurrency failure on commit)?
b.    If in auto-commit mode, if two cursors are opened at the same time (e.g. 
to scan them in an interleaved way), are they in independent transactions 
simultaneously active in the same connection?


7. Algorithms

a.       3.2.2: steps 4 and 5 are inverted in order.
b.      3.2.2: when there is a key generator and the store uses in-line keys, 
should the generated key value be propagated to the original object (in 
addition to the clone), such that both are in sync after the put operation?
c.       3.2.3: step 2, probably editorial mistake? Wouldn't all indexes have a 
key path?
d.      3.2.4.2: in our experiments writing application code, the fact that 
this method throws an exception when an item is not found is quite 
inconvenient. It would be much natural to just return undefined, as this can be 
a primary code path (to not find something) and not an exceptional situation. 
Same for 3.2.5, step 2 and 3.2.6 step 2.
e.      The algorithm to put a new object into a store currently indicates that 
the key of the object should be returned. How about other values that may be 
generated by the store? For example, if the store generates timestamps (not 
currently in the draft, but may be needed for optimistic concurrency control), 
how would be return them? should we update the actual object that was passed as 
a parameter with keys and other server-generated values? 


8. Performance and API style

a.       The async nature of the API makes regular scans very heavy on 
callbacks (one per row plus completion/error callbacks). This slows down scans 
a lot, so when doing a multiple scans (e.g. a reasonably complicated query that 
has joins, sorts and filters) performance will be bound by this even if 
everything else happens really fast. It would be interesting to support a 
block-fetch mode where the callback gets called for a number of buffered rows 
(indicated when the scan is initiated) instead of being called for a single 
row. This would be either a configuration option on openCursor or a new method 
on the cursor for 

      
9. API

a.       DatabaseSync.createIndex: what's the default for the unique argument?
b.      DatabaseSync.createObjectStore: what's the default for autoIncrement?
c.       DatabaseSync.openObjectStore: what's the default for mode?
d.      DatabaseSync.transaction: what's the units for the timeout value? 
Seconds? Is there a value that means "infinite"?
e.      ObjectStoreSync.get: see 7.d (return undefined instead of throwing an 
exception)
f.        ObjectStoreSync: what happens to the reference if the underlying 
store is deleted through another connection? We propose it's ok to alter 
underlying objects in general and "visible" objects should be ready and start 
failing when the objects they surface go away or are altered.
g.       CursorSync.openCursor: does the cursor start on the first record or 
before the first record? Should probably be before the first record so the 
first call to continue() can return false for empty stores, moving straight 
from BOF to EOF.
h.      CursorSync.count: what scenario does this enable? Also, name is 
misleading; should be sameKeyCount or something that indicates it's the count 
only of the rows that share the current key. 
i.         CursorSync.value: when the cursor is over an index, shouldn't the 
value be read-only as changing it would make it inconsistent with the object 
store this index is for?
j.        CursorSync.continue(): does it return false when it reaches the last 
record or when it's called *on* the last record and moves to EOF (effectively 
moved past the last record)? If it's sitting in EOF, does it "see" new inserts? 
(we assume not)
k.       CursorSync.delete(): "delete" causes trouble, should be "remove"
l.         CursorSync.delete(): what happens to the cursor position after this 
function returns? One option would be to leave the cursor on the deleted row, 
and fail all access attempts so only continue() can be called.
m.    IndexSync: the put/delete methods seem to enable users to modify the 
index independently of the store, making them inconsistent. Given that the only 
kind of index described is auto-populated, it doesn't seem appropriate to have 
these.
n.    Should we consider introducing an API that given an object and a store 
returns the key to that object? that would avoid the need for knowing the exact 
algorithm used to obtain the key from an object + path.


10.       API (async specifics)

a.       Currently the async API is only available on the window object and not 
to workers. Libraries are likely to target only one mode, in particular async, 
to work across all scenarios. So it would be important to have async also in 
workers.
b.      DBRequest.abort(): it may not be possible to guarantee abort in all 
phases of execution, so this should be described as a "best effort" method; 
onsuccess would be called if the system decided to proceed and complete the 
operation, and onerror if abort succeeded at stopping the operation (with 
proper code indicating the error is due to an explicit abort request). In any 
case ready state should go do done.
c.       The pattern where there is a single request object (e.g. 
indexedDB.request) prevents user code from having multiple outstanding requests 
against the same object (e.g. multiple ‘open' or multiple ‘openCursor' 
requests). An alternate pattern that does not have this problem would be to 
return the request object from the method (e.g. from ‘open'). 
d.      CursorRequest.continue(): this seems to break the pattern where 
request.result has the result of the operation; for continue the operation (in 
the sync version) is true/false depending on whether the cursor reached EOF. So 
in async request.result should be the true/false value, the value itself would 
be available in the cursor's "value" property,  and the success callback would 
be called instead of the error one.


11. API Names

a.       "transaction" is really non-intuitive (particularly given the 
existence of currentTransaction in the same class). "beginTransaction" would 
capture semantics more accurately.
b.      ObjectStoreSync.delete: delete is a Javascript keyword, can we use 
"remove" instead?


12. Object names in general

a.       For database, store, index and other names in general, the current 
description in various places says "case sensitive". It would be good to be 
more specific and indicate "exact match" of all constructs (e.g. accents, kana 
width). Binary match would be very restrictive but a safe target. Alternatively 
we could just leave this up to each implementation, and indicate 
non-normatively what would be safe pattern of strings to use.


13. Editorial notes

a.      Ranges: left-right versus start-end. "bound" versus "closed" for 
intervals.
b.      Ranges: bound, "Create a new right-bound key range." -> right & left 
bound
c.       3.2.7 obejct -> object
d.      The current draft fails to format in IE, the script that comes with the 
page fails with an error


Thanks
-pablo

[IndexedDB] Detailed comments for the current draft

Reply via email to