Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-13 Thread Charles McCathieNevile


On Mon, 12 May 2008 07:40:44 +0200, Chris Prince [EMAIL PROTECTED]  
wrote:



On Sun, May 11, 2008 at 9:22 PM, Aaron Boodman [EMAIL PROTECTED] wrote:

On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak

  Open question: can a File be stored in a SQL database? If
  so, does the database store the data or a reference (such as a path  
or Mac

  OS X Alias)?

 There definitely needs to be a way to store Files locally. I don't
 have a strong opinion as to whether this should be in the database, or
 in DOMStorage, or in something new just for files.


Which seems to me to bring us back to the fileIO idea. In the meantime,  
Arve (who is one of the people who did a lot of the thinking behind it)  
has been thinking about useful ways that it can be sliced and diced,  
making some functionalities more readily available to general  
applications, although I am not sure where his thoughts are at the moment.



A reference has the problem that the underlying file could be modified
by an external program.  I think once you save data into the SQL
database, you should be able to count on it staying constant, and
valid.


Well, that depends on whether you are saving a copy(-on-write) or a  
reference. There are cases where it is actually useful to be able to  
operate on the file beside what the application does in the database,  
despite the increased complexity this brings... So maybe it should be  
possible to store both, and at least to consciously seperate the two as  
ideas in any proposal. They have (IMHO) slightly different use cases.


Cheers

Chaals

--
Charles McCathieNevile  Opera Software, Standards Group
je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals   Try Opera 9.5: http://snapshot.opera.com



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-13 Thread Charles McCathieNevile


On Sun, 11 May 2008 05:10:57 +0200, Aaron Boodman [EMAIL PROTECTED] wrote:


On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote:

... I'm not really clear on why Blobs must be distinct from ByteArrays.


As I read it, the Blob proposal also explicitly ties in a bit of file  
interaction (which is why it is related to the fileIO proposal).


The only explanation is: The primary difference is that Blobs are  
immutable*, and can therefore represent large objects. But I am not  
sure why immutability is

necessary to have the ability to represent large objects.


Reading through the rest of the discussion, I don't think it is - in  
general it would seem useful to have a ByteArray, IMHO.


...

I also notice that you used int64 in many of the APIs. JavaScript cannot
represent 64-bit integers in its number type. ...


I think our assumption is that 2^53 is large enough to represent the
length of all the blobs going in and out of web apps for the
forseeable future. We would just throw when we receive a number that
is larger than that saying that it is out of range. Is there a better
way to notate this in specs?


Well, you at least have to be pretty explicit about it I think. Better  
would be to find a type that Javascript can do, though.


(I suspect that if we are still relying on a thing called 'blob' because  
we still don't have real file system access with some sense of security by  
the time we want to hand around a Terabyte in a web application, that we  
will have seriously failed somewhere. Although it isn't impossible that we  
end up there).


cheers

Chaals

--
Charles McCathieNevile  Opera Software, Standards Group
je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals   Try Opera 9.5: http://snapshot.opera.com



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-13 Thread Maciej Stachowiak



On May 13, 2008, at 5:08 AM, Charles McCathieNevile wrote:



On Sun, 11 May 2008 05:10:57 +0200, Aaron Boodman [EMAIL PROTECTED]  
wrote:


On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED]  
wrote:
... I'm not really clear on why Blobs must be distinct from  
ByteArrays.


As I read it, the Blob proposal also explicitly ties in a bit of  
file interaction (which is why it is related to the fileIO proposal).


That seems to be where things are evolving, but in the original  
proposal Blobs were also to be used for such things as the binary  
image data in a canvas, or binary data retrieved by XMLHttpRequest,  
or binary data dynamically generated by script. (I proposed renaming  
Blob to File because I think the non-file uses are better served via  
ByteArray).



...
I also notice that you used int64 in many of the APIs. JavaScript  
cannot

represent 64-bit integers in its number type. ...


I think our assumption is that 2^53 is large enough to represent the
length of all the blobs going in and out of web apps for the
forseeable future. We would just throw when we receive a number that
is larger than that saying that it is out of range. Is there a better
way to notate this in specs?


Well, you at least have to be pretty explicit about it I think.  
Better would be to find a type that Javascript can do, though.


(I suspect that if we are still relying on a thing called 'blob'  
because we still don't have real file system access with some sense  
of security by the time we want to hand around a Terabyte in a web  
application, that we will have seriously failed somewhere. Although  
it isn't impossible that we end up there).


I see no reason the Blob proposal couldn't handle uploading a Terabyte  
of data. 2^53  10^4. Indeed, for data that large you really do want a  
filesystem reference that you can hand directly to a network API so it  
can be sent without having to load the whole thing into memory via  
script.


Regards,
Maciej






Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-13 Thread Ian Hickson

On Tue, 13 May 2008, Maciej Stachowiak wrote:
 On May 13, 2008, at 5:08 AM, Charles McCathieNevile wrote:
  
  (I suspect that if we are still relying on a thing called 'blob' 
  because we still don't have real file system access with some sense of 
  security by the time we want to hand around a Terabyte in a web 
  application, that we will have seriously failed somewhere. Although it 
  isn't impossible that we end up there).
 
 I see no reason the Blob proposal couldn't handle uploading a Terabyte 
 of data. 2^53  10^4. Indeed, for data that large you really do want a 
 filesystem reference that you can hand directly to a network API so it 
 can be sent without having to load the whole thing into memory via 
 script.

Indeed. I should add that while many users aren't necessarily there yet, 
there are environments (e.g. within Google) where dealing with 
multi-terabyte files is a pretty regular occurance. We certainly have a 
vested interest in making sure that the Web APIs can handle this amount of 
data -- this is the kind of thing we'd be using now, if we could.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Blobs: An alternate (complementary?) binary data proposal

2008-05-12 Thread Anne van Kesteren


On Mon, 12 May 2008 01:08:48 +0200, Aaron Boodman [EMAIL PROTECTED] wrote:

Ok, so just so I'm clear, does the following example snippet
accurately reflect how you propose that things work?

var req = new XMLHttpRequest();
req.open(GET, example, true);
req.onreadystatechange = handleResult;
req.send(null);

function handleResult() {
  if (req.readyState != 4) return;

  var b1 = req.responseByteArray;


FWIW, XMLHttpRequest Level 2 already has this functionality in the form of  
responseBody:


  http://dev.w3.org/2006/webapi/XMLHttpRequest-2/

(send() is also accepts a ByteArray now.)


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/



Re: Blobs: An alternate (complementary?) binary data proposal

2008-05-12 Thread Aaron Boodman

On Mon, May 12, 2008 at 12:17 AM, Anne van Kesteren [EMAIL PROTECTED] wrote:
 FWIW, XMLHttpRequest Level 2 already has this functionality in the form of
 responseBody:

  http://dev.w3.org/2006/webapi/XMLHttpRequest-2/

 (send() is also accepts a ByteArray now.)

Thanks, I wasn't aware of that.

- a



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Chris Prince

Responses to several of the comments so far:

On Fri, May 9, 2008 at 9:15 PM, Ian Hickson [EMAIL PROTECTED] wrote:
 I'm not sure I like the way that the bytes are made accessible, but
 that's a minor detail really.

I tend to agree.  The 'Creating Blobs' section and the readAs*()
methods were added last-minute.  We don't know of apps that need that
functionality at present.  And binary manipulation seems unlikely to
be satisfying until ES4 anyway.  Personally, I think it's reasonable
to remove these sections from the Blob spec for now.


On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote:
 I'm not really clear on why Blobs must be distinct from ByteArrays.
 The only explanation is: The primary difference is that Blobs are
 immutable*, and can therefore represent large objects. But I am not
 sure why immutability is necessary to have the ability to represent
 large objects. If you are thinking immutability is necessary to be
 able to have large objects memory mapped from disk, then mmap with a
 private copy-on-write mapping should solve that problem just fine.

Making Blobs immutable simplifies a number of problems:

(1) Asynchronous APIs.

Large Blobs can be passed to XmlHttpRequest for an asynchronous POST,
or to Database for an asynchronous INSERT.  If Blobs are mutable, the
caller can modify the contents at any time.  The XmlHttpRequest or
Database operation will be undefined.

Careful callers could wait for the operation to finish (at least in
these two examples; I'm not sure about all possible scenarios).  But
this is starting to put quite a burden on developers.

(2) HTML5 Workers.

There are cases where apps will get a Blob on the UI thread, and then
want to operate on it in a Worker.  Note that the Blob may be
file-backed or memory-backed.

Worker threads are isolated execution environments.  If Blobs are
mutable, it seems like tricky (or impossible) gymnastics would be
required to ensure one thread's file writes aren't seen by another
thread's reads, unless you create a copy.  And that is doubly true for
memory-backed blobs.

(I'm not even considering older mobile operating systems, which may
not have all the file and memory capabilities of modern OSes.)


~~~

There is another, slightly different issue around mutability, which
hasn't really been called out yet.  It affects whether Blobs should be
directly readable.

One of the biggest motivations for Blobs was to do interesting things
with local files.  But these files may be modified by programs outside
the browser.

It would be pretty crazy if developers had to guard against the
'length' field changing between any two lines of JavaScript.

Locking files appears to be impossible on some platforms. (Even when
it is possible, the experience can be unsatisfying; anybody who has
seen a file is locked error -- with no additional info -- knows this
feeling.)  So our plan has been to check the file modification time
whenever a Blob's contents are read.

If Blobs are directly accessible, an exception could occur any time
the web app reads the contents.  If Blobs are not directly accessible
(as I would propose), then to developers, it only means methods that
accept Blob arguments may throw.

--Chris




Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Maciej Stachowiak



On May 10, 2008, at 11:39 PM, Chris Prince wrote:



On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED]  
wrote:

I'm not really clear on why Blobs must be distinct from ByteArrays.
The only explanation is: The primary difference is that Blobs are
immutable*, and can therefore represent large objects. But I am not
sure why immutability is necessary to have the ability to represent
large objects. If you are thinking immutability is necessary to be
able to have large objects memory mapped from disk, then mmap with a
private copy-on-write mapping should solve that problem just fine.


Making Blobs immutable simplifies a number of problems:

(1) Asynchronous APIs.

Large Blobs can be passed to XmlHttpRequest for an asynchronous POST,
or to Database for an asynchronous INSERT.  If Blobs are mutable, the
caller can modify the contents at any time.  The XmlHttpRequest or
Database operation will be undefined.

Careful callers could wait for the operation to finish (at least in
these two examples; I'm not sure about all possible scenarios).  But
this is starting to put quite a burden on developers.

(2) HTML5 Workers.

There are cases where apps will get a Blob on the UI thread, and then
want to operate on it in a Worker.  Note that the Blob may be
file-backed or memory-backed.

Worker threads are isolated execution environments.  If Blobs are
mutable, it seems like tricky (or impossible) gymnastics would be
required to ensure one thread's file writes aren't seen by another
thread's reads, unless you create a copy.  And that is doubly true for
memory-backed blobs.

(I'm not even considering older mobile operating systems, which may
not have all the file and memory capabilities of modern OSes.)


Both of these can be addressed by the APIs (including the worker  
transfer mechanism) making a copy, which can use a copy-on-write  
mechanism to avoid actually making a copy in the common case.


It seems like immutability creates its own problems. If you have a  
large piece of binary data, say retrieved over the network from XHR,  
and the only way to change it is to make a copy, and you have multiple  
pieces of your code that want to change it, you are going to be  
allocating memory for many copies.


(I should add that I also find the name Blob distasteful in an API,  
but that is a minor poin).


I'm still not convinced that immutability is good, or that the  
ECMAScript ByteArray proposal can't handle the required use cases.


Regards,
Maciej




Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Maciej Stachowiak



On May 11, 2008, at 4:08 PM, Aaron Boodman wrote:

On Sun, May 11, 2008 at 3:02 PM, Maciej Stachowiak [EMAIL PROTECTED]  
wrote:
Both of these can be addressed by the APIs (including the worker  
transfer
mechanism) making a copy, which can use a copy-on-write mechanism  
to avoid

actually making a copy in the common case.


Ok, so just so I'm clear, does the following example snippet
accurately reflect how you propose that things work?


I'm not sure I am following it all exactly, but I think yes. Variable  
assignment would not trigger any copy-on-write behavior, since it is  
still the same object. Passing to an API (including sendMessage to a  
worker) would make a copy-on-write virtual copy.





var req = new XMLHttpRequest();
req.open(GET, example, true);
req.onreadystatechange = handleResult;
req.send(null);

function handleResult() {
 if (req.readyState != 4) return;

 var b1 = req.responseByteArray;
 var b2 = b1;
 assert(b1 === b2); // they refer to the same object

 // print the contents of the array
 for (var i = 0; i  b1.length; i++) {
   print(b1[i]);
 }

 b1[0] = 42;
 assert(b2[0] == 42);

 var worker = window.createWorker(worker.js);
 worker.sendMessage(b1); // branches b1
 b1[0] = 43; // modification does not affect what got sent to worker
}

// worker.js
worker.onmessage = function(b) {
 assert(b[0] == 42);
};







I'm still not convinced that immutability is good, or that the  
ECMAScript

ByteArray proposal can't handle the required use cases.


Here's one additional question on how this would work with ByteArray.
The read API for ByteArray is currently synchronous. Doesn't this mean
that with large files accessing bytearray[n] could block?


If the ByteArray were in fact backed by a file, then accessing  
bytearray[n] could lead to part of the file being paged in. However,  
the same is true if it is backed by RAM that is swapped out. Even  
accessing uninitialized zero-fill memory could trap to the kernel,  
though that's in general not as bad as hitting disk (whether for swap  
or file bytes).


I can see how you may want to have an object to represent a file that  
can be handed to APIs directly, but that has only an async read  
interface for JS. However, I am pretty sure you would not want to use  
such an object to represent binary data returned from an XHR, or the  
pixel contents of a canvas. After all, the data is already in  
memory. So perhaps files need a distinct object from other forms of  
binary data, if we wanted to enforce such a restriction.


Regards,
Maciej




Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Aaron Boodman

On Sun, May 11, 2008 at 4:22 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote:
 Here's one additional question on how this would work with ByteArray.
 The read API for ByteArray is currently synchronous. Doesn't this mean
 that with large files accessing bytearray[n] could block?

 If the ByteArray were in fact backed by a file, then accessing bytearray[n]
 could lead to part of the file being paged in. However, the same is true if
 it is backed by RAM that is swapped out. Even accessing uninitialized
 zero-fill memory could trap to the kernel, though that's in general not as
 bad as hitting disk (whether for swap or file bytes).

But expressing the API as an array makes it seem like access is always
cheap, encouraging people to just burn through the file in a tight
loop. Such loops would actually hit the disk many times, right?

 I can see how you may want to have an object to represent a file that can be
 handed to APIs directly, but that has only an async read interface for JS.
 However, I am pretty sure you would not want to use such an object to
 represent binary data returned from an XHR, or the pixel contents of a
 canvas. After all, the data is already in memory. So perhaps files need a
 distinct object from other forms of binary data, if we wanted to enforce
 such a restriction.

I see what you mean for canvas, but not so much for XHR. It seems like
a valid use case to want to be able to use XHR to download very large
files. In that case, the thing you get back seems like it should have
an async API for reading.

- a



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Maciej Stachowiak



On May 11, 2008, at 4:40 PM, Aaron Boodman wrote:

On Sun, May 11, 2008 at 4:22 PM, Maciej Stachowiak [EMAIL PROTECTED]  
wrote:
Here's one additional question on how this would work with  
ByteArray.
The read API for ByteArray is currently synchronous. Doesn't this  
mean

that with large files accessing bytearray[n] could block?


If the ByteArray were in fact backed by a file, then accessing  
bytearray[n]
could lead to part of the file being paged in. However, the same is  
true if

it is backed by RAM that is swapped out. Even accessing uninitialized
zero-fill memory could trap to the kernel, though that's in general  
not as

bad as hitting disk (whether for swap or file bytes).


But expressing the API as an array makes it seem like access is always
cheap, encouraging people to just burn through the file in a tight
loop. Such loops would actually hit the disk many times, right?


Well, that depends on how good the OS buffer cache is at prefetching.  
But in general, there would be some disk access.


I can see how you may want to have an object to represent a file  
that can be
handed to APIs directly, but that has only an async read interface  
for JS.

However, I am pretty sure you would not want to use such an object to
represent binary data returned from an XHR, or the pixel contents  
of a
canvas. After all, the data is already in memory. So perhaps  
files need a
distinct object from other forms of binary data, if we wanted to  
enforce

such a restriction.


I see what you mean for canvas, but not so much for XHR. It seems like
a valid use case to want to be able to use XHR to download very large
files. In that case, the thing you get back seems like it should have
an async API for reading.


Hmm? If you get the data over the network it goes into RAM. Why would  
you want an async API to in-memory data? Or are you suggesting XHR  
should be changed to spool its data to disk? I do not think that is  
practical to do for all requests, so this would have to be a special  
API mode for responses that are expected to be too big to fit in memory.


Regards,
Maciej




Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Aaron Boodman

On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak [EMAIL PROTECTED] wrote:
 Well, that depends on how good the OS buffer cache is at prefetching. But in
 general, there would be some disk access.

It seems better if the read API is just async for this case to prevent
the problem.

 I see what you mean for canvas, but not so much for XHR. It seems like
 a valid use case to want to be able to use XHR to download very large
 files. In that case, the thing you get back seems like it should have
 an async API for reading.

 Hmm? If you get the data over the network it goes into RAM. Why would you
 want an async API to in-memory data? Or are you suggesting XHR should be
 changed to spool its data to disk? I do not think that is practical to do
 for all requests, so this would have to be a special API mode for responses
 that are expected to be too big to fit in memory.

Whether XHR spools to disk is an implementation detail, right? Right
now XHR is not practical to use for downloading large files because
the only way to access the result is as a string. Also because of
this, XHR implementations don't bother spooling to disk. But if this
API were added, then XHR implementations could be modified to start
spooling to disk if the response got large. If the caller requests
responseText, then the implementation just does the best it can to
read the whole thing into a string and reply. But if the caller uses
responseBlob (or whatever we call it) then it becomes practical to,
for example, download movie files, modify them, then re-upload them.

- a



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Maciej Stachowiak



On May 11, 2008, at 6:01 PM, Aaron Boodman wrote:

On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak [EMAIL PROTECTED]  
wrote:
Well, that depends on how good the OS buffer cache is at  
prefetching. But in

general, there would be some disk access.


It seems better if the read API is just async for this case to prevent
the problem.


It can't entirely prevent the problem. If you read a big enough chunk,  
it will cause swapping which hits the disk just as much as file reads.  
Possibly more, because real file access will trigger OS prefetch  
heuristics for linear access.


I see what you mean for canvas, but not so much for XHR. It seems  
like
a valid use case to want to be able to use XHR to download very  
large
files. In that case, the thing you get back seems like it should  
have

an async API for reading.


Hmm? If you get the data over the network it goes into RAM. Why  
would you
want an async API to in-memory data? Or are you suggesting XHR  
should be
changed to spool its data to disk? I do not think that is practical  
to do
for all requests, so this would have to be a special API mode for  
responses

that are expected to be too big to fit in memory.


Whether XHR spools to disk is an implementation detail, right? Right
now XHR is not practical to use for downloading large files because
the only way to access the result is as a string. Also because of
this, XHR implementations don't bother spooling to disk. But if this
API were added, then XHR implementations could be modified to start
spooling to disk if the response got large. If the caller requests
responseText, then the implementation just does the best it can to
read the whole thing into a string and reply. But if the caller uses
responseBlob (or whatever we call it) then it becomes practical to,
for example, download movie files, modify them, then re-upload them.


That sounds reasonable for very large files like movies. However,  
audio and image files are similar in size to the kinds of text or XML  
resources that are currently processed synchronously. In such cases  
they are likely to remain in memory.


In general it is sounding like it might be desirable to have at least  
two kinds of objects for representing binary data:


1) An in-memory, mutable representation with synchronous access. There  
should also be a copying API which is possibly copy-on-write for the  
backing store.


2) A possibly disk-backed representation that offers only asynchronous  
read (possibly in the form of representation #1).


Both representations could be used with APIs that can accept binary  
data. In most cases such APIs only take strings currently. The name of  
representation #2 may wish to tie it to being a file, since for  
anything already in memory you'd want representation #1. Perhaps they  
could be called ByteArray and File respectively. Open question: can a  
File be stored in a SQL database? If so, does the database store the  
data or a reference (such as a path or Mac OS X Alias)?


Regards,
Maciej





Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Aaron Boodman

On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak
 It seems better if the read API is just async for this case to prevent
 the problem.

 It can't entirely prevent the problem. If you read a big enough chunk, it
 will cause swapping which hits the disk just as much as file reads. Possibly
 more, because real file access will trigger OS prefetch heuristics for
 linear access.

Right, I think the UA has to have ultimate control over the chunk size
to prevent this. The length parameters on the read apis I suggested
would have to be what the caller desires, but the implementation
doesn't necessarily have to honor it. I've changed the parameter names
on our wiki page to 'desiredLength' to reflect this.

 Whether XHR spools to disk is an implementation detail, right? Right
 now XHR is not practical to use for downloading large files because
 the only way to access the result is as a string. Also because of
 this, XHR implementations don't bother spooling to disk. But if this
 API were added, then XHR implementations could be modified to start
 spooling to disk if the response got large. If the caller requests
 responseText, then the implementation just does the best it can to
 read the whole thing into a string and reply. But if the caller uses
 responseBlob (or whatever we call it) then it becomes practical to,
 for example, download movie files, modify them, then re-upload them.

 That sounds reasonable for very large files like movies. However, audio and
 image files are similar in size to the kinds of text or XML resources that
 are currently processed synchronously. In such cases they are likely to
 remain in memory.

 In general it is sounding like it might be desirable to have at least two
 kinds of objects for representing binary data:

 1) An in-memory, mutable representation with synchronous access. There
 should also be a copying API which is possibly copy-on-write for the backing
 store.

 2) A possibly disk-backed representation that offers only asynchronous read
 (possibly in the form of representation #1).

I agree with this, but I think using Blob/File whatever as the default
representation is convenient because you don't need to add multiple
getter APIs to things such as XHR (responseBytes and responseBlob).
And you probably remove some potential confusion over which getter is
correct to use for a given situation.

 Both representations could be used with APIs that can accept binary data. In
 most cases such APIs only take strings currently. The name of representation
 #2 may wish to tie it to being a file, since for anything already in memory
 you'd want representation #1. Perhaps they could be called ByteArray and
 File respectively.

Calling it File seems a little weird to me, particularly in the case
of XMLHttpRequest.

 Open question: can a File be stored in a SQL database? If
 so, does the database store the data or a reference (such as a path or Mac
 OS X Alias)?

There definitely needs to be a way to store Files locally. I don't
have a strong opinion as to whether this should be in the database, or
in DOMStorage, or in something new just for files.

- a



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-11 Thread Chris Prince

On Sun, May 11, 2008 at 9:22 PM, Aaron Boodman [EMAIL PROTECTED] wrote:
 On Sun, May 11, 2008 at 6:46 PM, Maciej Stachowiak

   Open question: can a File be stored in a SQL database? If
   so, does the database store the data or a reference (such as a path or Mac
   OS X Alias)?

  There definitely needs to be a way to store Files locally. I don't
  have a strong opinion as to whether this should be in the database, or
  in DOMStorage, or in something new just for files.

A reference has the problem that the underlying file could be modified
by an external program.  I think once you save data into the SQL
database, you should be able to count on it staying constant, and
valid.



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-10 Thread Maciej Stachowiak



On May 7, 2008, at 10:08 PM, Aaron Boodman wrote:



Hi everyone,

Opera has a proposal for a specification that would revive (and  
supersede)

the file upload API that has been lingering so long as a work item.


The Gears team has also been putting together a proposal for file
access which overlaps in some ways with Opera's, but is also
orthogonal in some ways:

http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal


I really like the idea of adding consistent APIs for binary data in  
the many places in the Web platform that need them. However, I'm not  
really clear on why Blobs must be distinct from ByteArrays. The only  
explanation is: The primary difference is that Blobs are immutable*,  
and can therefore represent large objects. But I am not sure why  
immutability is necessary to have the ability to represent large  
objects. If you are thinking immutability is necessary to be able to  
have large objects memory mapped from disk, then mmap with a private  
copy-on-write mapping should solve that problem just fine.


In fact, immutability seems clearly undesirable for many of these  
APIs. Seems like you will want to modify such things and create them  
from scratch.


I also notice that you used int64 in many of the APIs. JavaScript  
cannot represent 64-bit integers in its number type. All JavaScript  
numbers are IEEE floating point doubles. This will lose precision at  
the high end of the int64 range, which is likely unacceptable for many  
of these APIs. Thus, if 64-bit is really needed, then a primitive type  
will not do. You either need two int32s or an object type  
encapsulating this.


Regards,
Maciej





I would summarize the key differences this way:

* We introduce a new interface - a blob - that represents an immutable
chunk of (potentially large) binary data
* We propose adding the ability to get and send blobs to many
different APIs, including XHR, the input type=file element,
database, canvas, etc.
* We attempt far less interaction with the filesystem (just extending
the input element and allowing exporting a blob to a file).

To answer one of Maciej's questions from the other thread, we intend
this for use on the open web and do not intend for it to require any
particular security authorization.

We would also love feedback, and would like to work with any
interested vendors to iterate this to something others would
implement.

Thanks,

- a






Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-10 Thread Charles McCathieNevile


On Sat, 10 May 2008 06:15:01 +0200, Ian Hickson [EMAIL PROTECTED] wrote:


On Wed, 7 May 2008, Aaron Boodman wrote:



The Gears team has also been putting together a proposal for file access
which overlaps in some ways with Opera's, but is also orthogonal in some
ways:

http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal



This seems like it would be something we may want defined in a small spec
and then used by many others (like progress events) -- one spec to define
Blob, and then the other specs to use it (like XHR and HTML5/WF2).


That would seem like the way to go if we decide to take this on. The  
proposal is currently a small spec for one thing, after all.



Do we have the resources to have someone champion this spec?


Are you asking the WG, or Google?

cheers

Chaals

--
Charles McCathieNevile  Opera Software, Standards Group
je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals   Try Opera 9.5: http://snapshot.opera.com



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-10 Thread Ian Hickson

On Sun, 11 May 2008, Charles McCathieNevile wrote:
 
  Do we have the resources to have someone champion this spec?
 
 Are you asking the WG, or Google?

The Web community as a whole. I don't care which working group (if any) 
owns it, and I don't have any reason to prefer that Google work on this 
rather than anyone else -- my interest is in forwarding the platform as a 
whole. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-10 Thread Aaron Boodman

On Sat, May 10, 2008 at 1:18 AM, Maciej Stachowiak [EMAIL PROTECTED] wrote:
 I really like the idea of adding consistent APIs for binary data in the many
 places in the Web platform that need them. However, I'm not really clear on
 why Blobs must be distinct from ByteArrays. The only explanation is: The
 primary difference is that Blobs are immutable*, and can therefore represent
 large objects. But I am not sure why immutability is necessary to have the
 ability to represent large objects. If you are thinking immutability is
 necessary to be able to have large objects memory mapped from disk, then
 mmap with a private copy-on-write mapping should solve that problem just
 fine.

I'm going to defer to Chris Prince on this question, as he is the real
man behind this proposal on our team and has thought the most about
it. Not knowing much about mmap, it sounds like a fair enough idea to
me. It seems like you'd eventually still want to push things back to
disk after enough mutation, right? Would writes be synchronous? Would
reads?

 In fact, immutability seems clearly undesirable for many of these APIs.
 Seems like you will want to modify such things and create them from scratch.

We agree that you'd want to modify and create blobs, and have a TODO
for that in our proposal. However, we were thinking of handling this
similarly to how strings are handled in JavaScript. When you modify a
blob, you get a new blob instance.  But our ideas on this are pretty
poorly formed as we don't have as pressing a need as we do for the
things we've proposed so far.

 I also notice that you used int64 in many of the APIs. JavaScript cannot
 represent 64-bit integers in its number type. All JavaScript numbers are
 IEEE floating point doubles. This will lose precision at the high end of the
 int64 range, which is likely unacceptable for many of these APIs. Thus, if
 64-bit is really needed, then a primitive type will not do. You either need
 two int32s or an object type encapsulating this.

I think our assumption is that 2^53 is large enough to represent the
length of all the blobs going in and out of web apps for the
forseeable future. We would just throw when we receive a number that
is larger than that saying that it is out of range. Is there a better
way to notate this in specs?

On Sat, May 10, 2008 at 7:51 PM, Ian Hickson [EMAIL PROTECTED] wrote:
 On Sun, 11 May 2008, Charles McCathieNevile wrote:

  Do we have the resources to have someone champion this spec?

 Are you asking the WG, or Google?

 The Web community as a whole. I don't care which working group (if any)
 owns it, and I don't have any reason to prefer that Google work on this
 rather than anyone else -- my interest is in forwarding the platform as a
 whole. :-)

I think I might like to do this. What does it involve? Should we take
this part offline, Ian?

- a



Re: Blobs: An alternate (complementary?) binary data proposal (Was: File IO...)

2008-05-09 Thread Ian Hickson

On Wed, 7 May 2008, Aaron Boodman wrote:
 Charles wrote:
  Opera has a proposal for a specification that would revive (and 
  supersede) the file upload API that has been lingering so long as a 
  work item.

I would echo the other comments people have made regarding the security 
model being the important aspect of this kind of area.


 The Gears team has also been putting together a proposal for file access 
 which overlaps in some ways with Opera's, but is also orthogonal in some 
 ways:
 
 http://code.google.com/p/google-gears/wiki/BlobWebAPIPropsal

I do like the general approach here. I especially like the way that it 
combines the Mozilla extensions with a generic mechanism that applies 
across multiple APIs. I'm not sure I like the way that the bytes are made 
accessible, but that's a minor detail really. (In particular, I'd like to 
see plain text APIs to treat a blob according to a particular encoding.)

This seems like it would be something we may want defined in a small spec 
and then used by many others (like progress events) -- one spec to define 
Blob, and then the other specs to use it (like XHR and HTML5/WF2).

Do we have the resources to have someone champion this spec?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'