These are notes that we collected both from reviewing the spec (editor's draft
up to Jan 24th) and from a prototype implementation that we are working on. I
didn't realize we had this many notes, otherwise I would have been sending
intermediate notes early. Will do so next round.
1. Keys and sorting
a. 3.1.1: it would seem that having also date/time values as keys would
be important and it's a common sorting criteria (e.g. as part of a composite
primary key or in general as an index key).
b. 3.1.1: similarly, sorting on number in general (not just
integers/longs) would be important (e.g. price lists, scores, etc.)
c. 3.1.1: cross type sorting and sorting of long values are clear.
Sorting of strings however needs more elaboration. In particular, which
collation do we use? Does the user or developer get to choose a collation? If
we pick up a collation from the environment (e.g. the OS), if the collation
changes we'd have to re-index all the databases.
d. 3.1.3: spec reads "…key path must be the name of an enumerated
property…"; how about composite keys (would make the related APIs take a
DOMString or DOMStringList)
2. Values
a. 3.1.2: isn't the requirement for "structured clones" too much? It
would mean implementations would have to be able to store and retrieve File
objects and such. Would it be more appropriate to say it's just graphs of
Javascript primitive objects/values (object, string, number, date, arrays,
null)?
3. Object store
a. 3.1.3: do we really need in-line + out-of-line keys? Besides the
concept-count increase, we wonder whether out-of-line keys would cause trouble
to generic libraries, as the values for the keys wouldn't be part of the values
iterated when doing a "foreach" over the table.
b. Query processing libraries will need temporary stores, which need
temporary names. Should we introduce an API for the creation of temporary
stores with transaction lifetime and no name?
c. It would be nice to have an estimate row count on each store. This
comes at an implementation and runtime cost. Strong opinions? Lacking
everything else, this would be the only statistic to base decisions on for a
query processor.
d. The draft does not touch on how applications would do optimistic
concurrency. A common way of doing this is to use a timestamp value that's
automatically updated by the system every time someone touches the row. While
we don't feel it's a must have, it certainly supports common scenarios.
4. Indexes
a. 3.1.4 mentions "auto-populated" indexes, but then there is no mention
of other types. We suggest that we remove this and in the algorithms section
describe side-effecting operations as always updating the indexes as well.
b. If during insert/update the value of the key is not present (i.e.
undefined as opposite to null or a value), is that a failure, does the row not
get indexed, or is it indexed as null? Failure would probably cause a lot of
trouble to users; the other two have correctness problems. An option is to
index them as undefined, but now we have undefined and null as indexable keys.
We lean toward this last option.
5. Databases
a. Not being able to enumerate database gets in the way of creating good
tools and frameworks such as database explorers. What was the motivation for
this? Is it security related?
b. Clarification on transactions: all database operations that affect the
schema (create/remove store/index, setVersion, etc.) as well as data
modification operations are assumed to be auto-commit by default, correct?
Furthermore, all those operations (both schema and data) can happen within a
transaction, including mixing schema and data changes. Does that line up with
others' expectations? If so we should find a spot to articulate this explicitly.
c. No way to delete a database? It would be reasonable for applications
to want to do that and let go of the user data (e.g. a "forget me" feature in a
web site)
6. Transactions
a. While we understand the goal of simplifying developers' life with an
error-free transactional model, we're not sure if we're making more harm by
introducing more concepts into this space. Wouldn't it be better to use regular
transactions with a well-known failure mode (e.g. either deadlocks or
optimistic concurrency failure on commit)?
b. If in auto-commit mode, if two cursors are opened at the same time (e.g.
to scan them in an interleaved way), are they in independent transactions
simultaneously active in the same connection?
7. Algorithms
a. 3.2.2: steps 4 and 5 are inverted in order.
b. 3.2.2: when there is a key generator and the store uses in-line keys,
should the generated key value be propagated to the original object (in
addition to the clone), such that both are in sync after the put operation?
c. 3.2.3: step 2, probably editorial mistake? Wouldn't all indexes have a
key path?
d. 3.2.4.2: in our experiments writing application code, the fact that
this method throws an exception when an item is not found is quite
inconvenient. It would be much natural to just return undefined, as this can be
a primary code path (to not find something) and not an exceptional situation.
Same for 3.2.5, step 2 and 3.2.6 step 2.
e. The algorithm to put a new object into a store currently indicates that
the key of the object should be returned. How about other values that may be
generated by the store? For example, if the store generates timestamps (not
currently in the draft, but may be needed for optimistic concurrency control),
how would be return them? should we update the actual object that was passed as
a parameter with keys and other server-generated values?
8. Performance and API style
a. The async nature of the API makes regular scans very heavy on
callbacks (one per row plus completion/error callbacks). This slows down scans
a lot, so when doing a multiple scans (e.g. a reasonably complicated query that
has joins, sorts and filters) performance will be bound by this even if
everything else happens really fast. It would be interesting to support a
block-fetch mode where the callback gets called for a number of buffered rows
(indicated when the scan is initiated) instead of being called for a single
row. This would be either a configuration option on openCursor or a new method
on the cursor for
9. API
a. DatabaseSync.createIndex: what's the default for the unique argument?
b. DatabaseSync.createObjectStore: what's the default for autoIncrement?
c. DatabaseSync.openObjectStore: what's the default for mode?
d. DatabaseSync.transaction: what's the units for the timeout value?
Seconds? Is there a value that means "infinite"?
e. ObjectStoreSync.get: see 7.d (return undefined instead of throwing an
exception)
f. ObjectStoreSync: what happens to the reference if the underlying
store is deleted through another connection? We propose it's ok to alter
underlying objects in general and "visible" objects should be ready and start
failing when the objects they surface go away or are altered.
g. CursorSync.openCursor: does the cursor start on the first record or
before the first record? Should probably be before the first record so the
first call to continue() can return false for empty stores, moving straight
from BOF to EOF.
h. CursorSync.count: what scenario does this enable? Also, name is
misleading; should be sameKeyCount or something that indicates it's the count
only of the rows that share the current key.
i. CursorSync.value: when the cursor is over an index, shouldn't the
value be read-only as changing it would make it inconsistent with the object
store this index is for?
j. CursorSync.continue(): does it return false when it reaches the last
record or when it's called *on* the last record and moves to EOF (effectively
moved past the last record)? If it's sitting in EOF, does it "see" new inserts?
(we assume not)
k. CursorSync.delete(): "delete" causes trouble, should be "remove"
l. CursorSync.delete(): what happens to the cursor position after this
function returns? One option would be to leave the cursor on the deleted row,
and fail all access attempts so only continue() can be called.
m. IndexSync: the put/delete methods seem to enable users to modify the
index independently of the store, making them inconsistent. Given that the only
kind of index described is auto-populated, it doesn't seem appropriate to have
these.
n. Should we consider introducing an API that given an object and a store
returns the key to that object? that would avoid the need for knowing the exact
algorithm used to obtain the key from an object + path.
10. API (async specifics)
a. Currently the async API is only available on the window object and not
to workers. Libraries are likely to target only one mode, in particular async,
to work across all scenarios. So it would be important to have async also in
workers.
b. DBRequest.abort(): it may not be possible to guarantee abort in all
phases of execution, so this should be described as a "best effort" method;
onsuccess would be called if the system decided to proceed and complete the
operation, and onerror if abort succeeded at stopping the operation (with
proper code indicating the error is due to an explicit abort request). In any
case ready state should go do done.
c. The pattern where there is a single request object (e.g.
indexedDB.request) prevents user code from having multiple outstanding requests
against the same object (e.g. multiple ‘open' or multiple ‘openCursor'
requests). An alternate pattern that does not have this problem would be to
return the request object from the method (e.g. from ‘open').
d. CursorRequest.continue(): this seems to break the pattern where
request.result has the result of the operation; for continue the operation (in
the sync version) is true/false depending on whether the cursor reached EOF. So
in async request.result should be the true/false value, the value itself would
be available in the cursor's "value" property, and the success callback would
be called instead of the error one.
11. API Names
a. "transaction" is really non-intuitive (particularly given the
existence of currentTransaction in the same class). "beginTransaction" would
capture semantics more accurately.
b. ObjectStoreSync.delete: delete is a Javascript keyword, can we use
"remove" instead?
12. Object names in general
a. For database, store, index and other names in general, the current
description in various places says "case sensitive". It would be good to be
more specific and indicate "exact match" of all constructs (e.g. accents, kana
width). Binary match would be very restrictive but a safe target. Alternatively
we could just leave this up to each implementation, and indicate
non-normatively what would be safe pattern of strings to use.
13. Editorial notes
a. Ranges: left-right versus start-end. "bound" versus "closed" for
intervals.
b. Ranges: bound, "Create a new right-bound key range." -> right & left
bound
c. 3.2.7 obejct -> object
d. The current draft fails to format in IE, the script that comes with the
page fails with an error
Thanks
-pablo