Re: [IndexDB] Proposal for async API changes

Jonas Sicking Wed, 09 Jun 2010 15:29:49 -0700

On Wed, Jun 9, 2010 at 11:39 AM, Laxmi Narsimha Rao Oruganti
<laxmi.oruga...@microsoft.com> wrote:
> Inline...
>
> -----Original Message-----
> From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On 
> Behalf Of Jonas Sicking
> Sent: Wednesday, June 09, 2010 11:55 PM
> To: Jeremy Orlow
> Cc: Shawn Wilsher; Webapps WG
> Subject: Re: [IndexDB] Proposal for async API changes
>
> On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jor...@chromium.org> wrote:
>> On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jo...@sicking.cc> wrote:
>>>
>>> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jor...@chromium.org>
>>> wrote:
>>> > I'm not sure I like the idea of offering sync cursors either since the
>>> > UA
>>> > will either need to load everything into memory before starting or risk
>>> > blocking on disk IO for large data sets.  Thus I'm not sure I support
>>> > the
>>> > idea of synchronous cursors.  But, at the same time, I'm concerned about
>>> > the
>>> > overhead of firing one event per value with async cursors.  Which is
>>> > why I
>>> > was suggesting an interface where the common case (the data is in
>>> > memory) is
>>> > done synchronously but the uncommon case (we'd block if we had to
>>> > respond
>>> > synchronously) has to be handled since we guarantee that the first time
>>> > will
>>> > be forced to be asynchronous.
>>> > Like I said, I'm not super happy with what I proposed, but I think some
>>> > hybrid async/sync interface is really what we need.  Have you guys spent
>>> > any
>>> > time thinking about something like this?  How dead-set are you on
>>> > synchronous cursors?
>>>
>>> The idea is that synchronous cursors load all the required data into
>>> memory, yes. I think it would help authors a lot to be able to load
>>> small chunks of data into memory and read and write to it
>>> synchronously. Dealing with asynchronous operations constantly is
>>> certainly possible, but a bit of a pain for authors.
>>>
>>> I don't think we should obsess too much about not keeping things in
>>> memory, we already have things like canvas and the DOM which adds up
>>> to non-trivial amounts of memory.
>>>
>>> Just because data is loaded from a database doesn't mean it's huge.
>>>
>>> I do note that you're not as concerned about getAll(), which actually
>>> have worse memory characteristics than synchronous cursors since you
>>> need to create the full JS object graph in memory.
>>
>> I've been thinking about this off and on since the original proposal was
>> made, and I just don't feel right about getAll() or synchronous cursors.
>>  You make some good points about there already being many ways to overwhelm
>> ram with webAPIs, but is there any place we make it so easy?  You're right
>> that just because it's a database doesn't mean it needs to be huge, but
>> often times they can get quite big.  And if a developer doesn't spend time
>> making sure they test their app with the upper ends of what users may
>> possibly see, it just seems like this is a recipe for problems.
>> Here's a concrete example: structured clone allows you to store image data.
>>  Lets say I'm building an image hosting site and that I cache all the images
>> along with their thumbnails locally in an IndexedDB entity store.  Lets say
>> each thumbnail is a trivial amount, but each image is 1MB.  I have an album
>> with 1000 images.  I do |var photos = albumIndex.getAllObjects(albumName);|
>> and then iterate over that to get the thumbnails.  But I've just loaded over
>> 1GB of stuff into ram (assuming no additional inefficiency/blowup).  I
>> suppose it's possible JavaScript engines could build mechanisms to fetch
>> this stuff lazily (like you could even with a synchronous cursor) but that
>> will take time/effort and introduce lag in the page (while fetching
>> additional info from disk).
>>
>> I'm not completely against the idea of getAll/sync cursors, but I do think
>> they should be de-coupled from this proposed API.  I would also suggest that
>> we re-consider them only after at least one implementation has normal
>> cursors working and there's been some experimentation with it.  Until then,
>> we're basing most of our arguments on intuition and assumptions.
>
> I'm not married to the concept of sync cursors. However I pretty
> strongly feel that getAll is something we need. If we just allow
> cursors for getting multiple results I think we'll see an extremely
> common pattern of people using a cursor to loop through a result set
> and put values into an array.
>
> Yes, it can be misused, but I don't see a reason why people wouldn't
> misuse a cursor just as much. If they don't think about the fact that
> a range contains lots of data when using getAll, why would they think
> about it when using cursors?
>
> [Laxmi] Cursor is a streaming operator that means only the current row or 
> page is available in memory and the rest sits on the disk.  As the program 
> moves the cursor thru the result, old pages are thrown away and new pages are 
> loaded from the result set.  Whereas with getAll everything has to come to 
> memory before returning to the caller.  If there is not enough memory to keep 
> the result all at a time, we would end up in out-of-memory.  In short, getAll 
> suites well for small result/range, but not for big databases.  That is, with 
> getAll we are expecting the people to think and where as with Cursors we 
> don't expect the people to think about the volume/size of the result.


I'm well aware of this. My argument is that I think we'll see people
write code like this:

results = [];
db.objectStore("foo").openCursor(range).onsuccess = function(e) {
  var cursor = e.result;
  if (!cursor) {
    weAreDone(results);
  }
  results.push(cursor.value);
  cursor.continue();
}

While the indexedDB implementation doesn't hold much data in memory at
a time, the webpage will hold just as much as if we had had a getAll
function. Thus we havn't actually improved anything, only forced the
author to write more code.


Put it another way: The raised concern is that people won't think
about the fact that getAll can load a lot of data into memory. And the
proposed solution is to remove the getAll function and tell people to
use openCursor. However if they weren't thinking about that a lot of
data will be in memory at one time, then why wouldn't they write code
like the above? Which results as just as much data being in memory?

/ Jonas

Re: [IndexDB] Proposal for async API changes

Reply via email to