Re: [webkit-dev] Notifications for Blob serialization/deserialization

Michael Nordman Tue, 10 Jul 2012 14:33:01 -0700

On Tue, Jul 3, 2012 at 4:08 PM, Greg Billock <[email protected]> wrote:


> Do you think the way my patch does this, by exposing two new methods on
> the BlobController for notifying serialization and deserialization, will be
> sufficient?



For IndexedDB, it looks to me that notifications like this (or on the
> BlobData ID, as you propose) will be critical -- the IndexedDB will need to
> read the blob values to store them, at which time it can release the refs
> it holds. The math won't work out unless it can know that the ref it just
> deleted was given to it in serialization, which my notifier would
> accomplish. Similarly for history -- if history supports Blob, then
> packaging that serialization needs to hold the refcount.
>
> All these use cases amount to the same thing: use of a serialized Blob
> outside the "normal" usage of creation/deletion within a single context. I
> think we're all in agreement that Blob semantics according to the spec
> should support this use case, even if Blob URLs explicitly don't.
>
> Do you want to revamp the management API to use BlobData IDs before adding
> a pair of de/serialization APIs? If we do, we can skip some of the
> nastiness here, like managing an extra serialization internal URL. It'll
> definitely be easier to see. If that's going to take a while, though, I'd
> like to get a preliminary solution in place so web intents can use blobs.
>

I'd vote to decouple proximate fixes from general revampage if possible so
we don't hold worthy capabilities hostage to housecleaning. Being able
to utilize blobs in webintent payloads seems pretty valuable.


>
> Who's a good person to ask about the history API's support (or lack
> thereof) of Blobs?
>

Thnx for pointing out the the pushstate / history use-case, i didn't know
about that one. I'm not sure who all knows about that?


>
>
> On Tue, Jun 26, 2012 at 2:54 PM, Michael Nordman <[email protected]>wrote:
>
>> I think revamping our Blob handling is one of the projects that I should
>> be working on next. What we have now is proving to be too difficult to work
>> with in a variety of ways.
>>
>> The more i look at this particular difficulty with sending blobs within
>> serialized values (within chromium), the more convinced i am that we should
>> switch to identifying the 'data underlying a blob' rather than the 'blob
>> object instance'. Currently our 'ids' identify a WebCore::Blob instance.
>> The trouble with putting ids on blob object instances is that they dont
>> cross process boundaries, so we end up either trying to create the id in
>> advance of the instance in some other process, or relying on the
>> originating instance still being around when the new instance in the other
>> process gets coined (based on the originals ids). We have to juggle 2 ids
>> instead of one and conceptually its just a mess.
>>
>> What I'm thinking about is changes along these lines...
>>
>> * WebCore::Blobs have an 'id' that refers to the 'data'. At ctor and dtor
>> time, they increment/decrement a refcount, via ipcs, on the underlying
>> 'data' in the main browser. A clone or copy of that WebCore::Blob instance
>> anywhere in the system will carry that same 'id' and do that same
>> refcounting of the underlying data.
>>
>> * When a IPC message containing a serialized blob (the id value) is in
>> transit, extra refs are applied to the underlying blob data. There are
>> several cases where we need to consider a blob as 'in transit'. Being
>> referred to in an WebCore::SSV is one place. Being referred to in a
>> content::SSV is another. As those objects come and go, they should inc/dec
>> any blobs they refer to accordingly. A content::SSV has to distinguish
>> between running in the main browser process and a child process to do that
>> properly.
>>
>> * A challenging case is when sending SSVs containing blobs from the
>> browser process to a child (renderer|worker) process. After a message is
>> put on the wire going that direction, its not safe to deref the blob data
>> until the receiving side has picked it up and added a ref of its own. I
>> think we'll need ACKS for messages flowing in that direction so the
>> browser-side knows when its safe to drop the extra in transit ref.
>>
>> * Its easier going the other way, from child to browser, since if the
>> child has a ref when the 'id' is put on the wire, that ref cant possibly be
>> removed prior to that msg reaching the browser.
>>
>> Not sure how much of this disccusion affects webkit's native blob
>> handling really? This whole chromium multi-process problem doesn't exist
>> there. I think we can put high level abstractions in webcore that we then
>> fork at that high level... and the forked chromium impl deals with the
>> complexities in our multi-process case w/o impatcing webcore's native
>> single process impl.
>>
>> Another source of difficulty stems from how the ThreadableBlobRegistry
>> posts all operations to the 'main' webkit thread. That's not going to work
>> very well with how we handle IDB operations entirely from the 'context'
>> thread even if its a background worker context. Message sent on behalf of
>> blobs take a longer slower route to reach our main browser process than
>> those sent on behalf of IDB, so an IDB message that contains a blob id may
>> reach the main browser prior to the blob messages that create that blob
>> reach the main browser. I think we have to avoid going thru the 'main'
>> webkit thread in the chromium port for the blob handling messages to avoid
>> odd races.
>>
>>
>>
>> On Tue, Jun 26, 2012 at 11:03 AM, Joshua Bell <[email protected]>wrote:
>>
>>> On Tue, Jun 26, 2012 at 10:19 AM, Greg Billock <[email protected]>wrote:
>>>
>>>> I've been working with Michael Nordman on a problem related to Blob
>>>> serialization. Currently, if we serialize a Blob, there's no
>>>> notification to the BlobRegistry, so the serialized Blob will be
>>>> garbage collected at an indeterminate time if the context in which it
>>>> was created is closed. This lifetime scoping is spelled out clearly
>>>> for Blob URLs created using the URL API. I see nothing in the spec
>>>> imposing the same lifetime scoping on Blobs themselves, so I think it
>>>> is an implementation flaw. To correctly handle this serialization
>>>> case, however, the BlobRegistry needs to know when serialization and
>>>> deserialization operations happen for Blob objects.
>>>>
>>>> I've created a patch that adds that logic and plumbs it through to the
>>>> Chromium API. Webkit patch:
>>>> https://bugs.webkit.org/show_bug.cgi?id=89921 The corresponding
>>>> Chromium-side patch is here: http://codereview.chromium.org/10662024/
>>>>
>>>>
>>>> The strategy I used is to create a new internal URL and use that for
>>>> serialization:
>>>>
>>>>  KURL cloneURL = BlobURL::createInternalURL();
>>>>  blobRegistry().didSerializeBlob(cloneURL, blob->url());
>>>>  m_writer.writeBlob(cloneURL.string(), blob->type(), blob->size());
>>>>  m_blobURLs.append(cloneURL.string());
>>>>
>>>> Then upon deserialization, there's a corresponding didDeserializeBlob
>>>> call.
>>>>
>>>> There are a couple alternatives I considered. One is to not instrument
>>>> the SerializedScriptValue this way, but require callers to use the
>>>> blobURLs() accessor and do the right thing then. I'm more in favor of
>>>> inserting this logic directly into serialization, since it removes an
>>>> implementation gotcha and I think this more closely follows the spec.
>>>>
>>>
>>> The blobURLs() accessor was put in as a stop-gap to let us detect
>>> Blobs-in-SSVs and fail early when the full plumbing was missing. Don't read
>>> too much into its presence. (In practical terms, its current use could be
>>> replaced by a "does this SSV contain a Blob?" flag, but at the time
>>> exposing the list seemed like it might be useful.)
>>>
>>>
>>>> Since the primary interaction with the BlobRegistry is via URL, I
>>>> maintained that as well. The other is to use BlobData directly here.
>>>> Michael is planning on doing some maintenance work to the ID system
>>>> used internally anyway, and that may end up being the right path to
>>>> take in that case.
>>>>
>>>> The open-ended implication of this change is that throwing away
>>>> serialized values is now not acceptable -- they need to be checked for
>>>> blob refs and then have some to-be-written code invoked to trigger the
>>>> deref of the BlobData. We don't know where all that may be happening,
>>>> currently, so it's still something of a science project.
>>>
>>>
>>> Please coordinate with ericu@ (I'm sure you're already in contact) who
>>> is looking at how to enable Blob support for IndexedDB. It's going to be
>>> one of the more exciting cases, which  includes SSV data flowing across
>>> process boundaries (in the Chromium port), storage to disk, and deletion
>>> operations occurring in processes that do not have script contexts
>>> available.
>>>
>>>
>>>> We already
>>>> have a SerializedScriptValue object we could interrogate upon
>>>> destruction, but the whole point is that the wire format created there
>>>> could show up anywhere, so we need to either restrict that ability and
>>>> maintain object-level control, or trust callers of the serialized wire
>>>> string accessor to handle them correctly. Neither is particularly
>>>> exciting. Any better ideas?
>>>>
>>>> -Greg
>>>> _______________________________________________
>>>> webkit-dev mailing list
>>>> [email protected]
>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>>
>>>
>>>
>>
>

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Notifications for Blob serialization/deserialization

Reply via email to