Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Mon, 12 May 2008 07:40:44 +0200, Chris Prince [EMAIL PROTECTED] wrote: On Sun, May 11, 2008 at 9:22 PM, Aaron Boodman [EMAIL PROTECTED] wrote: On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak Open question: can a File be stored in a SQL database? If so, does the database store the data or a reference (such as a path or Mac OS X Alias)? There definitely needs to be a way to store Files locally. I don't have a strong opinion as to whether this should be in the database, or in DOMStorage, or in something new just for files. Which seems to me to bring us back to the fileIO idea. In the meantime, Arve (who is one of the people who did a lot of the thinking behind it) has been thinking about useful ways that it can be sliced and diced, making some functionalities more readily available to general applications, although I am not sure where his thoughts are at the moment. A reference has the problem that the underlying file could be modified by an external program. I think once you save data into the SQL database, you should be able to count on it staying constant, and valid. Well, that depends on whether you are saving a copy(-on-write) or a reference. There are cases where it is actually useful to be able to operate on the file beside what the application does in the database, despite the increased complexity this brings... So maybe it should be possible to store both, and at least to consciously seperate the two as ideas in any proposal. They have (IMHO) slightly different use cases. Cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera 9.5: http://snapshot.opera.com
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, 11 May 2008 05:10:57 +0200, Aaron Boodman [EMAIL PROTECTED] wrote: On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote: ... I'm not really clear on why Blobs must be distinct from ByteArrays. As I read it, the Blob proposal also explicitly ties in a bit of file interaction (which is why it is related to the fileIO proposal). The only explanation is: The primary difference is that Blobs are immutable*, and can therefore represent large objects. But I am not sure why immutability is necessary to have the ability to represent large objects. Reading through the rest of the discussion, I don't think it is - in general it would seem useful to have a ByteArray, IMHO. ... I also notice that you used int64 in many of the APIs. JavaScript cannot represent 64-bit integers in its number type. ... I think our assumption is that 2^53 is large enough to represent the length of all the blobs going in and out of web apps for the forseeable future. We would just throw when we receive a number that is larger than that saying that it is out of range. Is there a better way to notate this in specs? Well, you at least have to be pretty explicit about it I think. Better would be to find a type that Javascript can do, though. (I suspect that if we are still relying on a thing called 'blob' because we still don't have real file system access with some sense of security by the time we want to hand around a Terabyte in a web application, that we will have seriously failed somewhere. Although it isn't impossible that we end up there). cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera 9.5: http://snapshot.opera.com
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 13, 2008, at 5:08 AM, Charles McCathieNevile wrote: On Sun, 11 May 2008 05:10:57 +0200, Aaron Boodman [EMAIL PROTECTED] wrote: On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote: ... I'm not really clear on why Blobs must be distinct from ByteArrays. As I read it, the Blob proposal also explicitly ties in a bit of file interaction (which is why it is related to the fileIO proposal). That seems to be where things are evolving, but in the original proposal Blobs were also to be used for such things as the binary image data in a canvas, or binary data retrieved by XMLHttpRequest, or binary data dynamically generated by script. (I proposed renaming Blob to File because I think the non-file uses are better served via ByteArray). ... I also notice that you used int64 in many of the APIs. JavaScript cannot represent 64-bit integers in its number type. ... I think our assumption is that 2^53 is large enough to represent the length of all the blobs going in and out of web apps for the forseeable future. We would just throw when we receive a number that is larger than that saying that it is out of range. Is there a better way to notate this in specs? Well, you at least have to be pretty explicit about it I think. Better would be to find a type that Javascript can do, though. (I suspect that if we are still relying on a thing called 'blob' because we still don't have real file system access with some sense of security by the time we want to hand around a Terabyte in a web application, that we will have seriously failed somewhere. Although it isn't impossible that we end up there). I see no reason the Blob proposal couldn't handle uploading a Terabyte of data. 2^53 10^4. Indeed, for data that large you really do want a filesystem reference that you can hand directly to a network API so it can be sent without having to load the whole thing into memory via script. Regards, Maciej
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Tue, 13 May 2008, Maciej Stachowiak wrote: On May 13, 2008, at 5:08 AM, Charles McCathieNevile wrote: (I suspect that if we are still relying on a thing called 'blob' because we still don't have real file system access with some sense of security by the time we want to hand around a Terabyte in a web application, that we will have seriously failed somewhere. Although it isn't impossible that we end up there). I see no reason the Blob proposal couldn't handle uploading a Terabyte of data. 2^53 10^4. Indeed, for data that large you really do want a filesystem reference that you can hand directly to a network API so it can be sent without having to load the whole thing into memory via script. Indeed. I should add that while many users aren't necessarily there yet, there are environments (e.g. within Google) where dealing with multi-terabyte files is a pretty regular occurance. We certainly have a vested interest in making sure that the Web APIs can handle this amount of data -- this is the kind of thing we'd be using now, if we could. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: Blobs: An alternate (complementary?) binary data proposal
On Mon, 12 May 2008 01:08:48 +0200, Aaron Boodman [EMAIL PROTECTED] wrote: Ok, so just so I'm clear, does the following example snippet accurately reflect how you propose that things work? var req = new XMLHttpRequest(); req.open(GET, example, true); req.onreadystatechange = handleResult; req.send(null); function handleResult() { if (req.readyState != 4) return; var b1 = req.responseByteArray; FWIW, XMLHttpRequest Level 2 already has this functionality in the form of responseBody: http://dev.w3.org/2006/webapi/XMLHttpRequest-2/ (send() is also accepts a ByteArray now.) -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: Blobs: An alternate (complementary?) binary data proposal
On Mon, May 12, 2008 at 12:17 AM, Anne van Kesteren [EMAIL PROTECTED] wrote: FWIW, XMLHttpRequest Level 2 already has this functionality in the form of responseBody: http://dev.w3.org/2006/webapi/XMLHttpRequest-2/ (send() is also accepts a ByteArray now.) Thanks, I wasn't aware of that. - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
Responses to several of the comments so far: On Fri, May 9, 2008 at 9:15 PM, Ian Hickson [EMAIL PROTECTED] wrote: I'm not sure I like the way that the bytes are made accessible, but that's a minor detail really. I tend to agree. The 'Creating Blobs' section and the readAs*() methods were added last-minute. We don't know of apps that need that functionality at present. And binary manipulation seems unlikely to be satisfying until ES4 anyway. Personally, I think it's reasonable to remove these sections from the Blob spec for now. On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote: I'm not really clear on why Blobs must be distinct from ByteArrays. The only explanation is: The primary difference is that Blobs are immutable*, and can therefore represent large objects. But I am not sure why immutability is necessary to have the ability to represent large objects. If you are thinking immutability is necessary to be able to have large objects memory mapped from disk, then mmap with a private copy-on-write mapping should solve that problem just fine. Making Blobs immutable simplifies a number of problems: (1) Asynchronous APIs. Large Blobs can be passed to XmlHttpRequest for an asynchronous POST, or to Database for an asynchronous INSERT. If Blobs are mutable, the caller can modify the contents at any time. The XmlHttpRequest or Database operation will be undefined. Careful callers could wait for the operation to finish (at least in these two examples; I'm not sure about all possible scenarios). But this is starting to put quite a burden on developers. (2) HTML5 Workers. There are cases where apps will get a Blob on the UI thread, and then want to operate on it in a Worker. Note that the Blob may be file-backed or memory-backed. Worker threads are isolated execution environments. If Blobs are mutable, it seems like tricky (or impossible) gymnastics would be required to ensure one thread's file writes aren't seen by another thread's reads, unless you create a copy. And that is doubly true for memory-backed blobs. (I'm not even considering older mobile operating systems, which may not have all the file and memory capabilities of modern OSes.) ~~~ There is another, slightly different issue around mutability, which hasn't really been called out yet. It affects whether Blobs should be directly readable. One of the biggest motivations for Blobs was to do interesting things with local files. But these files may be modified by programs outside the browser. It would be pretty crazy if developers had to guard against the 'length' field changing between any two lines of JavaScript. Locking files appears to be impossible on some platforms. (Even when it is possible, the experience can be unsatisfying; anybody who has seen a file is locked error -- with no additional info -- knows this feeling.) So our plan has been to check the file modification time whenever a Blob's contents are read. If Blobs are directly accessible, an exception could occur any time the web app reads the contents. If Blobs are not directly accessible (as I would propose), then to developers, it only means methods that accept Blob arguments may throw. --Chris
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 10, 2008, at 11:39 PM, Chris Prince wrote: On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote: I'm not really clear on why Blobs must be distinct from ByteArrays. The only explanation is: The primary difference is that Blobs are immutable*, and can therefore represent large objects. But I am not sure why immutability is necessary to have the ability to represent large objects. If you are thinking immutability is necessary to be able to have large objects memory mapped from disk, then mmap with a private copy-on-write mapping should solve that problem just fine. Making Blobs immutable simplifies a number of problems: (1) Asynchronous APIs. Large Blobs can be passed to XmlHttpRequest for an asynchronous POST, or to Database for an asynchronous INSERT. If Blobs are mutable, the caller can modify the contents at any time. The XmlHttpRequest or Database operation will be undefined. Careful callers could wait for the operation to finish (at least in these two examples; I'm not sure about all possible scenarios). But this is starting to put quite a burden on developers. (2) HTML5 Workers. There are cases where apps will get a Blob on the UI thread, and then want to operate on it in a Worker. Note that the Blob may be file-backed or memory-backed. Worker threads are isolated execution environments. If Blobs are mutable, it seems like tricky (or impossible) gymnastics would be required to ensure one thread's file writes aren't seen by another thread's reads, unless you create a copy. And that is doubly true for memory-backed blobs. (I'm not even considering older mobile operating systems, which may not have all the file and memory capabilities of modern OSes.) Both of these can be addressed by the APIs (including the worker transfer mechanism) making a copy, which can use a copy-on-write mechanism to avoid actually making a copy in the common case. It seems like immutability creates its own problems. If you have a large piece of binary data, say retrieved over the network from XHR, and the only way to change it is to make a copy, and you have multiple pieces of your code that want to change it, you are going to be allocating memory for many copies. (I should add that I also find the name Blob distasteful in an API, but that is a minor poin). I'm still not convinced that immutability is good, or that the ECMAScript ByteArray proposal can't handle the required use cases. Regards, Maciej
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 11, 2008, at 4:08 PM, Aaron Boodman wrote: On Sun, May 11, 2008 at 3:02 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote: Both of these can be addressed by the APIs (including the worker transfer mechanism) making a copy, which can use a copy-on-write mechanism to avoid actually making a copy in the common case. Ok, so just so I'm clear, does the following example snippet accurately reflect how you propose that things work? I'm not sure I am following it all exactly, but I think yes. Variable assignment would not trigger any copy-on-write behavior, since it is still the same object. Passing to an API (including sendMessage to a worker) would make a copy-on-write virtual copy. var req = new XMLHttpRequest(); req.open(GET, example, true); req.onreadystatechange = handleResult; req.send(null); function handleResult() { if (req.readyState != 4) return; var b1 = req.responseByteArray; var b2 = b1; assert(b1 === b2); // they refer to the same object // print the contents of the array for (var i = 0; i b1.length; i++) { print(b1[i]); } b1[0] = 42; assert(b2[0] == 42); var worker = window.createWorker(worker.js); worker.sendMessage(b1); // branches b1 b1[0] = 43; // modification does not affect what got sent to worker } // worker.js worker.onmessage = function(b) { assert(b[0] == 42); }; I'm still not convinced that immutability is good, or that the ECMAScript ByteArray proposal can't handle the required use cases. Here's one additional question on how this would work with ByteArray. The read API for ByteArray is currently synchronous. Doesn't this mean that with large files accessing bytearray[n] could block? If the ByteArray were in fact backed by a file, then accessing bytearray[n] could lead to part of the file being paged in. However, the same is true if it is backed by RAM that is swapped out. Even accessing uninitialized zero-fill memory could trap to the kernel, though that's in general not as bad as hitting disk (whether for swap or file bytes). I can see how you may want to have an object to represent a file that can be handed to APIs directly, but that has only an async read interface for JS. However, I am pretty sure you would not want to use such an object to represent binary data returned from an XHR, or the pixel contents of a canvas. After all, the data is already in memory. So perhaps files need a distinct object from other forms of binary data, if we wanted to enforce such a restriction. Regards, Maciej
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, May 11, 2008 at 4:22 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote: Here's one additional question on how this would work with ByteArray. The read API for ByteArray is currently synchronous. Doesn't this mean that with large files accessing bytearray[n] could block? If the ByteArray were in fact backed by a file, then accessing bytearray[n] could lead to part of the file being paged in. However, the same is true if it is backed by RAM that is swapped out. Even accessing uninitialized zero-fill memory could trap to the kernel, though that's in general not as bad as hitting disk (whether for swap or file bytes). But expressing the API as an array makes it seem like access is always cheap, encouraging people to just burn through the file in a tight loop. Such loops would actually hit the disk many times, right? I can see how you may want to have an object to represent a file that can be handed to APIs directly, but that has only an async read interface for JS. However, I am pretty sure you would not want to use such an object to represent binary data returned from an XHR, or the pixel contents of a canvas. After all, the data is already in memory. So perhaps files need a distinct object from other forms of binary data, if we wanted to enforce such a restriction. I see what you mean for canvas, but not so much for XHR. It seems like a valid use case to want to be able to use XHR to download very large files. In that case, the thing you get back seems like it should have an async API for reading. - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 11, 2008, at 4:40 PM, Aaron Boodman wrote: On Sun, May 11, 2008 at 4:22 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote: Here's one additional question on how this would work with ByteArray. The read API for ByteArray is currently synchronous. Doesn't this mean that with large files accessing bytearray[n] could block? If the ByteArray were in fact backed by a file, then accessing bytearray[n] could lead to part of the file being paged in. However, the same is true if it is backed by RAM that is swapped out. Even accessing uninitialized zero-fill memory could trap to the kernel, though that's in general not as bad as hitting disk (whether for swap or file bytes). But expressing the API as an array makes it seem like access is always cheap, encouraging people to just burn through the file in a tight loop. Such loops would actually hit the disk many times, right? Well, that depends on how good the OS buffer cache is at prefetching. But in general, there would be some disk access. I can see how you may want to have an object to represent a file that can be handed to APIs directly, but that has only an async read interface for JS. However, I am pretty sure you would not want to use such an object to represent binary data returned from an XHR, or the pixel contents of a canvas. After all, the data is already in memory. So perhaps files need a distinct object from other forms of binary data, if we wanted to enforce such a restriction. I see what you mean for canvas, but not so much for XHR. It seems like a valid use case to want to be able to use XHR to download very large files. In that case, the thing you get back seems like it should have an async API for reading. Hmm? If you get the data over the network it goes into RAM. Why would you want an async API to in-memory data? Or are you suggesting XHR should be changed to spool its data to disk? I do not think that is practical to do for all requests, so this would have to be a special API mode for responses that are expected to be too big to fit in memory. Regards, Maciej
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote: Well, that depends on how good the OS buffer cache is at prefetching. But in general, there would be some disk access. It seems better if the read API is just async for this case to prevent the problem. I see what you mean for canvas, but not so much for XHR. It seems like a valid use case to want to be able to use XHR to download very large files. In that case, the thing you get back seems like it should have an async API for reading. Hmm? If you get the data over the network it goes into RAM. Why would you want an async API to in-memory data? Or are you suggesting XHR should be changed to spool its data to disk? I do not think that is practical to do for all requests, so this would have to be a special API mode for responses that are expected to be too big to fit in memory. Whether XHR spools to disk is an implementation detail, right? Right now XHR is not practical to use for downloading large files because the only way to access the result is as a string. Also because of this, XHR implementations don't bother spooling to disk. But if this API were added, then XHR implementations could be modified to start spooling to disk if the response got large. If the caller requests responseText, then the implementation just does the best it can to read the whole thing into a string and reply. But if the caller uses responseBlob (or whatever we call it) then it becomes practical to, for example, download movie files, modify them, then re-upload them. - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 11, 2008, at 6:01 PM, Aaron Boodman wrote: On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote: Well, that depends on how good the OS buffer cache is at prefetching. But in general, there would be some disk access. It seems better if the read API is just async for this case to prevent the problem. It can't entirely prevent the problem. If you read a big enough chunk, it will cause swapping which hits the disk just as much as file reads. Possibly more, because real file access will trigger OS prefetch heuristics for linear access. I see what you mean for canvas, but not so much for XHR. It seems like a valid use case to want to be able to use XHR to download very large files. In that case, the thing you get back seems like it should have an async API for reading. Hmm? If you get the data over the network it goes into RAM. Why would you want an async API to in-memory data? Or are you suggesting XHR should be changed to spool its data to disk? I do not think that is practical to do for all requests, so this would have to be a special API mode for responses that are expected to be too big to fit in memory. Whether XHR spools to disk is an implementation detail, right? Right now XHR is not practical to use for downloading large files because the only way to access the result is as a string. Also because of this, XHR implementations don't bother spooling to disk. But if this API were added, then XHR implementations could be modified to start spooling to disk if the response got large. If the caller requests responseText, then the implementation just does the best it can to read the whole thing into a string and reply. But if the caller uses responseBlob (or whatever we call it) then it becomes practical to, for example, download movie files, modify them, then re-upload them. That sounds reasonable for very large files like movies. However, audio and image files are similar in size to the kinds of text or XML resources that are currently processed synchronously. In such cases they are likely to remain in memory. In general it is sounding like it might be desirable to have at least two kinds of objects for representing binary data: 1) An in-memory, mutable representation with synchronous access. There should also be a copying API which is possibly copy-on-write for the backing store. 2) A possibly disk-backed representation that offers only asynchronous read (possibly in the form of representation #1). Both representations could be used with APIs that can accept binary data. In most cases such APIs only take strings currently. The name of representation #2 may wish to tie it to being a file, since for anything already in memory you'd want representation #1. Perhaps they could be called ByteArray and File respectively. Open question: can a File be stored in a SQL database? If so, does the database store the data or a reference (such as a path or Mac OS X Alias)? Regards, Maciej
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak It seems better if the read API is just async for this case to prevent the problem. It can't entirely prevent the problem. If you read a big enough chunk, it will cause swapping which hits the disk just as much as file reads. Possibly more, because real file access will trigger OS prefetch heuristics for linear access. Right, I think the UA has to have ultimate control over the chunk size to prevent this. The length parameters on the read apis I suggested would have to be what the caller desires, but the implementation doesn't necessarily have to honor it. I've changed the parameter names on our wiki page to 'desiredLength' to reflect this. Whether XHR spools to disk is an implementation detail, right? Right now XHR is not practical to use for downloading large files because the only way to access the result is as a string. Also because of this, XHR implementations don't bother spooling to disk. But if this API were added, then XHR implementations could be modified to start spooling to disk if the response got large. If the caller requests responseText, then the implementation just does the best it can to read the whole thing into a string and reply. But if the caller uses responseBlob (or whatever we call it) then it becomes practical to, for example, download movie files, modify them, then re-upload them. That sounds reasonable for very large files like movies. However, audio and image files are similar in size to the kinds of text or XML resources that are currently processed synchronously. In such cases they are likely to remain in memory. In general it is sounding like it might be desirable to have at least two kinds of objects for representing binary data: 1) An in-memory, mutable representation with synchronous access. There should also be a copying API which is possibly copy-on-write for the backing store. 2) A possibly disk-backed representation that offers only asynchronous read (possibly in the form of representation #1). I agree with this, but I think using Blob/File whatever as the default representation is convenient because you don't need to add multiple getter APIs to things such as XHR (responseBytes and responseBlob). And you probably remove some potential confusion over which getter is correct to use for a given situation. Both representations could be used with APIs that can accept binary data. In most cases such APIs only take strings currently. The name of representation #2 may wish to tie it to being a file, since for anything already in memory you'd want representation #1. Perhaps they could be called ByteArray and File respectively. Calling it File seems a little weird to me, particularly in the case of XMLHttpRequest. Open question: can a File be stored in a SQL database? If so, does the database store the data or a reference (such as a path or Mac OS X Alias)? There definitely needs to be a way to store Files locally. I don't have a strong opinion as to whether this should be in the database, or in DOMStorage, or in something new just for files. - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, May 11, 2008 at 9:22 PM, Aaron Boodman [EMAIL PROTECTED] wrote: On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak Open question: can a File be stored in a SQL database? If so, does the database store the data or a reference (such as a path or Mac OS X Alias)? There definitely needs to be a way to store Files locally. I don't have a strong opinion as to whether this should be in the database, or in DOMStorage, or in something new just for files. A reference has the problem that the underlying file could be modified by an external program. I think once you save data into the SQL database, you should be able to count on it staying constant, and valid.
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On May 7, 2008, at 10:08 PM, Aaron Boodman wrote: Hi everyone, Opera has a proposal for a specification that would revive (and supersede) the file upload API that has been lingering so long as a work item. The Gears team has also been putting together a proposal for file access which overlaps in some ways with Opera's, but is also orthogonal in some ways: http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal I really like the idea of adding consistent APIs for binary data in the many places in the Web platform that need them. However, I'm not really clear on why Blobs must be distinct from ByteArrays. The only explanation is: The primary difference is that Blobs are immutable*, and can therefore represent large objects. But I am not sure why immutability is necessary to have the ability to represent large objects. If you are thinking immutability is necessary to be able to have large objects memory mapped from disk, then mmap with a private copy-on-write mapping should solve that problem just fine. In fact, immutability seems clearly undesirable for many of these APIs. Seems like you will want to modify such things and create them from scratch. I also notice that you used int64 in many of the APIs. JavaScript cannot represent 64-bit integers in its number type. All JavaScript numbers are IEEE floating point doubles. This will lose precision at the high end of the int64 range, which is likely unacceptable for many of these APIs. Thus, if 64-bit is really needed, then a primitive type will not do. You either need two int32s or an object type encapsulating this. Regards, Maciej I would summarize the key differences this way: * We introduce a new interface - a blob - that represents an immutable chunk of (potentially large) binary data * We propose adding the ability to get and send blobs to many different APIs, including XHR, the input type=file element, database, canvas, etc. * We attempt far less interaction with the filesystem (just extending the input element and allowing exporting a blob to a file). To answer one of Maciej's questions from the other thread, we intend this for use on the open web and do not intend for it to require any particular security authorization. We would also love feedback, and would like to work with any interested vendors to iterate this to something others would implement. Thanks, - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sat, 10 May 2008 06:15:01 +0200, Ian Hickson [EMAIL PROTECTED] wrote: On Wed, 7 May 2008, Aaron Boodman wrote: The Gears team has also been putting together a proposal for file access which overlaps in some ways with Opera's, but is also orthogonal in some ways: http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal This seems like it would be something we may want defined in a small spec and then used by many others (like progress events) -- one spec to define Blob, and then the other specs to use it (like XHR and HTML5/WF2). That would seem like the way to go if we decide to take this on. The proposal is currently a small spec for one thing, after all. Do we have the resources to have someone champion this spec? Are you asking the WG, or Google? cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera 9.5: http://snapshot.opera.com
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sun, 11 May 2008, Charles McCathieNevile wrote: Do we have the resources to have someone champion this spec? Are you asking the WG, or Google? The Web community as a whole. I don't care which working group (if any) owns it, and I don't have any reason to prefer that Google work on this rather than anyone else -- my interest is in forwarding the platform as a whole. :-) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote: I really like the idea of adding consistent APIs for binary data in the many places in the Web platform that need them. However, I'm not really clear on why Blobs must be distinct from ByteArrays. The only explanation is: The primary difference is that Blobs are immutable*, and can therefore represent large objects. But I am not sure why immutability is necessary to have the ability to represent large objects. If you are thinking immutability is necessary to be able to have large objects memory mapped from disk, then mmap with a private copy-on-write mapping should solve that problem just fine. I'm going to defer to Chris Prince on this question, as he is the real man behind this proposal on our team and has thought the most about it. Not knowing much about mmap, it sounds like a fair enough idea to me. It seems like you'd eventually still want to push things back to disk after enough mutation, right? Would writes be synchronous? Would reads? In fact, immutability seems clearly undesirable for many of these APIs. Seems like you will want to modify such things and create them from scratch. We agree that you'd want to modify and create blobs, and have a TODO for that in our proposal. However, we were thinking of handling this similarly to how strings are handled in JavaScript. When you modify a blob, you get a new blob instance. But our ideas on this are pretty poorly formed as we don't have as pressing a need as we do for the things we've proposed so far. I also notice that you used int64 in many of the APIs. JavaScript cannot represent 64-bit integers in its number type. All JavaScript numbers are IEEE floating point doubles. This will lose precision at the high end of the int64 range, which is likely unacceptable for many of these APIs. Thus, if 64-bit is really needed, then a primitive type will not do. You either need two int32s or an object type encapsulating this. I think our assumption is that 2^53 is large enough to represent the length of all the blobs going in and out of web apps for the forseeable future. We would just throw when we receive a number that is larger than that saying that it is out of range. Is there a better way to notate this in specs? On Sat, May 10, 2008 at 7:51 PM, Ian Hickson [EMAIL PROTECTED] wrote: On Sun, 11 May 2008, Charles McCathieNevile wrote: Do we have the resources to have someone champion this spec? Are you asking the WG, or Google? The Web community as a whole. I don't care which working group (if any) owns it, and I don't have any reason to prefer that Google work on this rather than anyone else -- my interest is in forwarding the platform as a whole. :-) I think I might like to do this. What does it involve? Should we take this part offline, Ian? - a
Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)
On Wed, 7 May 2008, Aaron Boodman wrote: Charles wrote: Opera has a proposal for a specification that would revive (and supersede) the file upload API that has been lingering so long as a work item. I would echo the other comments people have made regarding the security model being the important aspect of this kind of area. The Gears team has also been putting together a proposal for file access which overlaps in some ways with Opera's, but is also orthogonal in some ways: http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal I do like the general approach here. I especially like the way that it combines the Mozilla extensions with a generic mechanism that applies across multiple APIs. I'm not sure I like the way that the bytes are made accessible, but that's a minor detail really. (In particular, I'd like to see plain text APIs to treat a blob according to a particular encoding.) This seems like it would be something we may want defined in a small spec and then used by many others (like progress events) -- one spec to define Blob, and then the other specs to use it (like XHR and HTML5/WF2). Do we have the resources to have someone champion this spec? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'