On Tue, Jul 3, 2012 at 4:08 PM, Greg Billock <gbill...@google.com> wrote:
> Do you think the way my patch does this, by exposing two new methods on > the BlobController for notifying serialization and deserialization, will be > sufficient? For IndexedDB, it looks to me that notifications like this (or on the > BlobData ID, as you propose) will be critical -- the IndexedDB will need to > read the blob values to store them, at which time it can release the refs > it holds. The math won't work out unless it can know that the ref it just > deleted was given to it in serialization, which my notifier would > accomplish. Similarly for history -- if history supports Blob, then > packaging that serialization needs to hold the refcount. > > All these use cases amount to the same thing: use of a serialized Blob > outside the "normal" usage of creation/deletion within a single context. I > think we're all in agreement that Blob semantics according to the spec > should support this use case, even if Blob URLs explicitly don't. > > Do you want to revamp the management API to use BlobData IDs before adding > a pair of de/serialization APIs? If we do, we can skip some of the > nastiness here, like managing an extra serialization internal URL. It'll > definitely be easier to see. If that's going to take a while, though, I'd > like to get a preliminary solution in place so web intents can use blobs. > I'd vote to decouple proximate fixes from general revampage if possible so we don't hold worthy capabilities hostage to housecleaning. Being able to utilize blobs in webintent payloads seems pretty valuable. > > Who's a good person to ask about the history API's support (or lack > thereof) of Blobs? > Thnx for pointing out the the pushstate / history use-case, i didn't know about that one. I'm not sure who all knows about that? > > > On Tue, Jun 26, 2012 at 2:54 PM, Michael Nordman <micha...@google.com>wrote: > >> I think revamping our Blob handling is one of the projects that I should >> be working on next. What we have now is proving to be too difficult to work >> with in a variety of ways. >> >> The more i look at this particular difficulty with sending blobs within >> serialized values (within chromium), the more convinced i am that we should >> switch to identifying the 'data underlying a blob' rather than the 'blob >> object instance'. Currently our 'ids' identify a WebCore::Blob instance. >> The trouble with putting ids on blob object instances is that they dont >> cross process boundaries, so we end up either trying to create the id in >> advance of the instance in some other process, or relying on the >> originating instance still being around when the new instance in the other >> process gets coined (based on the originals ids). We have to juggle 2 ids >> instead of one and conceptually its just a mess. >> >> What I'm thinking about is changes along these lines... >> >> * WebCore::Blobs have an 'id' that refers to the 'data'. At ctor and dtor >> time, they increment/decrement a refcount, via ipcs, on the underlying >> 'data' in the main browser. A clone or copy of that WebCore::Blob instance >> anywhere in the system will carry that same 'id' and do that same >> refcounting of the underlying data. >> >> * When a IPC message containing a serialized blob (the id value) is in >> transit, extra refs are applied to the underlying blob data. There are >> several cases where we need to consider a blob as 'in transit'. Being >> referred to in an WebCore::SSV is one place. Being referred to in a >> content::SSV is another. As those objects come and go, they should inc/dec >> any blobs they refer to accordingly. A content::SSV has to distinguish >> between running in the main browser process and a child process to do that >> properly. >> >> * A challenging case is when sending SSVs containing blobs from the >> browser process to a child (renderer|worker) process. After a message is >> put on the wire going that direction, its not safe to deref the blob data >> until the receiving side has picked it up and added a ref of its own. I >> think we'll need ACKS for messages flowing in that direction so the >> browser-side knows when its safe to drop the extra in transit ref. >> >> * Its easier going the other way, from child to browser, since if the >> child has a ref when the 'id' is put on the wire, that ref cant possibly be >> removed prior to that msg reaching the browser. >> >> Not sure how much of this disccusion affects webkit's native blob >> handling really? This whole chromium multi-process problem doesn't exist >> there. I think we can put high level abstractions in webcore that we then >> fork at that high level... and the forked chromium impl deals with the >> complexities in our multi-process case w/o impatcing webcore's native >> single process impl. >> >> Another source of difficulty stems from how the ThreadableBlobRegistry >> posts all operations to the 'main' webkit thread. That's not going to work >> very well with how we handle IDB operations entirely from the 'context' >> thread even if its a background worker context. Message sent on behalf of >> blobs take a longer slower route to reach our main browser process than >> those sent on behalf of IDB, so an IDB message that contains a blob id may >> reach the main browser prior to the blob messages that create that blob >> reach the main browser. I think we have to avoid going thru the 'main' >> webkit thread in the chromium port for the blob handling messages to avoid >> odd races. >> >> >> >> On Tue, Jun 26, 2012 at 11:03 AM, Joshua Bell <jsb...@chromium.org>wrote: >> >>> On Tue, Jun 26, 2012 at 10:19 AM, Greg Billock <gbill...@google.com>wrote: >>> >>>> I've been working with Michael Nordman on a problem related to Blob >>>> serialization. Currently, if we serialize a Blob, there's no >>>> notification to the BlobRegistry, so the serialized Blob will be >>>> garbage collected at an indeterminate time if the context in which it >>>> was created is closed. This lifetime scoping is spelled out clearly >>>> for Blob URLs created using the URL API. I see nothing in the spec >>>> imposing the same lifetime scoping on Blobs themselves, so I think it >>>> is an implementation flaw. To correctly handle this serialization >>>> case, however, the BlobRegistry needs to know when serialization and >>>> deserialization operations happen for Blob objects. >>>> >>>> I've created a patch that adds that logic and plumbs it through to the >>>> Chromium API. Webkit patch: >>>> https://bugs.webkit.org/show_bug.cgi?id=89921 The corresponding >>>> Chromium-side patch is here: http://codereview.chromium.org/10662024/ >>>> >>>> >>>> The strategy I used is to create a new internal URL and use that for >>>> serialization: >>>> >>>> KURL cloneURL = BlobURL::createInternalURL(); >>>> blobRegistry().didSerializeBlob(cloneURL, blob->url()); >>>> m_writer.writeBlob(cloneURL.string(), blob->type(), blob->size()); >>>> m_blobURLs.append(cloneURL.string()); >>>> >>>> Then upon deserialization, there's a corresponding didDeserializeBlob >>>> call. >>>> >>>> There are a couple alternatives I considered. One is to not instrument >>>> the SerializedScriptValue this way, but require callers to use the >>>> blobURLs() accessor and do the right thing then. I'm more in favor of >>>> inserting this logic directly into serialization, since it removes an >>>> implementation gotcha and I think this more closely follows the spec. >>>> >>> >>> The blobURLs() accessor was put in as a stop-gap to let us detect >>> Blobs-in-SSVs and fail early when the full plumbing was missing. Don't read >>> too much into its presence. (In practical terms, its current use could be >>> replaced by a "does this SSV contain a Blob?" flag, but at the time >>> exposing the list seemed like it might be useful.) >>> >>> >>>> Since the primary interaction with the BlobRegistry is via URL, I >>>> maintained that as well. The other is to use BlobData directly here. >>>> Michael is planning on doing some maintenance work to the ID system >>>> used internally anyway, and that may end up being the right path to >>>> take in that case. >>>> >>>> The open-ended implication of this change is that throwing away >>>> serialized values is now not acceptable -- they need to be checked for >>>> blob refs and then have some to-be-written code invoked to trigger the >>>> deref of the BlobData. We don't know where all that may be happening, >>>> currently, so it's still something of a science project. >>> >>> >>> Please coordinate with ericu@ (I'm sure you're already in contact) who >>> is looking at how to enable Blob support for IndexedDB. It's going to be >>> one of the more exciting cases, which includes SSV data flowing across >>> process boundaries (in the Chromium port), storage to disk, and deletion >>> operations occurring in processes that do not have script contexts >>> available. >>> >>> >>>> We already >>>> have a SerializedScriptValue object we could interrogate upon >>>> destruction, but the whole point is that the wire format created there >>>> could show up anywhere, so we need to either restrict that ability and >>>> maintain object-level control, or trust callers of the serialized wire >>>> string accessor to handle them correctly. Neither is particularly >>>> exciting. Any better ideas? >>>> >>>> -Greg >>>> _______________________________________________ >>>> webkit-dev mailing list >>>> webkit-dev@lists.webkit.org >>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >>>> >>> >>> >> >
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev