Re: [whatwg] AppCache-related e-mails

2011-08-04 Thread Michael Nordman
On Tue, Aug 2, 2011 at 5:23 PM, Michael Nordman micha...@google.com wrote:

 On Mon, 13 Jun 2011, Michael Nordman wrote:
 
  Let's say there's a page in the cache to be used as a fallback resource,
  refers to the manifest by relative url...
 
  html manifest='x'
 
  Depending on the url that invokes the fallback resource, 'x' will be
  resolved to different absolute urls. When it doesn't match the actual
  manifest url, the fallback resource will get tagged as FOREIGN and will
  no longer be used to satisfy main resource loads.
 
  I'm not sure if this is a bug in chrome or a bug in the appcache spec
  just yet. I'm pretty certain that Safari will have the same behavior as
  chrome in this respect (the same bug). The value of the manifest
  attribute is interpreted as relative to the location of the loaded
  document in chrome and all webkit based browsers and that value is used
  to detect foreign'ness.
 
  The workaround/solution for this is to NOT put a manifest attribute in
  the html tag of the fallback resource (or to put either an absolute
  url or host relative url as the manifest attribute value).

 Or just make sure you always use relative URLs, even in the manifest.

 I don't really understand the problem here. Can you elaborate further?


 Suppose the fallback resource is setup like this...

 FALLBACK:
 / FallbackPage.html

 ... and that page contains a relative link to the manifest in its html tag 
 like so...
 html manifest=file.manifest

 Any server request that fails under / will get FallbackPage.html in response. 
 For example...

 /SomePage.html

 When the fallback is used in this case the manifest url will be interpreted 
 as /file.manifest

 /Some/Other/Page.html

 And in this case the manifest url will be interpreted as 
 /Some/Other/file.manifest


 On Fri, 1 Jul 2011, Michael Nordman wrote:
 
  Cross-origin resources listed in the CACHE section aren't retrieved with
  the 'Origin' header

 This is incorrect. They are fetched with the origin of the manifest. What
 makes you say no Origin header is included?


 I don't see mention of that in the draft? If that were the case then this
 wouldn't be an issue.

 I'm not familiar with CORS usage. Do xorigin subresource loads of all kinds
 (.js, .css, .png) carry the Origin header?

 I can imagine a server implementation that would examine the Origin header
 upfront, and if it didn't like what it saw, instead of computing the
 response without the origin listed in the Access-Control-Allow-Origin
 response header... it just wouldn't compute the response body and return an
 empty response without the origin listed in the Access-Control-Allow-Origin
 response header.

 If general subresource loads aren't sent with the Origin header, fetching
 all manifest listed resource with that header set could cause problems.


According to some documentation over at mozilla'land, the value of the
Origin header is different depending on the source of the request.
https://wiki.mozilla.org/Security/Origin#When_Origin_is_served_.28and_when_it_is_.22null.22.29
So i think including Origin:manifestUrlOrigin when fetching all resources
to populate an appcache could be the source of some subtle bugs.


Re: [whatwg] AppCache-related e-mails

2011-08-03 Thread Michael Nordman
On Tue, Aug 2, 2011 at 4:55 PM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 2 Aug 2011, Michael Nordman wrote:
  
   If you actively want to seek out old manifests, sure, but what's the
   use case for doing that? It would be like trying to actively evict
   things from HTTP caches.
 
  You should talk to some app developers. View source on angry birds for a
  use case, they are doing this to get rid of stale version tied to old
  manifest urls.

 But why?

 I couldn't figure out the use case from the source you mention.


This is a message I recently received from a different developer using the
appcache that would also like to see more in the way of being able to manage
the set of appcaches in the system. Please see the use cases listed towards
the end.


Hi Michael, Greg.  I'm writing to advise you of a requirement I'd like to
see appcache fulfill in the medium term.  We've spoken about it before, but
only in the general context of 'what would you like to see in the future'.
 No releases are gated on this feature, so I guess we're talking M15 or
thereabouts.  Feel free to cross-post this to a list you deem relevant for
wider review and discussion.

The feature is a javascript API to enable the creation, enumeration, update,
and deletion of appcaches on the current origin.  Calls might look something
like this:

/** Creates a new cache or updates an existing one with the given manifest
URL.  Manifest URL must be in the same origin as the JS */
createOrUpdateCache(String manifestUri, completionCallback, errorCallback);

/** Enumerates the caches present on the current origin */
enumerateCaches(CacheEnumerationCallback callback, ErrorCallback
errorCallback);

interface CacheEnumerationCallback {
  void handleEvent(Cache[] caches);
}

interface Cache {
  number getManifestUri();
  number getSizeInBytes();
  String getManifestAsText();
  String[] getMasterEntryUris();
  String[] getExplicitEntryUris();
  FallbackEntry[] getFallbackEntries();
  String[] getNetworkWhitelistUris();
  boolean isNetworkWhitelistOpen();
  DateTime getCreationTime();
  DateTime getLastManifestFetchTime(); // The last time the manifest was
fetched and checked for differences
  DateTime getLastUpdateTime(); // The last time a manifest fetch caused an
actual update
  DateTime getLastAccessTime(); // The last time the cache actually bound to
a browsing context
  // Maybe some APIs to signal whether the cache is currently being updated,
and whether there is currently a running browsing context bound to it.

  void delete(... some callbacks ...); // Probably fails if there's a
running browsing context bound to the cache
  void update(... some callbacks ...); // I guess a no-op if an update is
currently in progress or maybe even if it happened very recently
}

interface FallbackEntry {
  String getTriggerUri();
  String getTargetUri();
}

Additional characteristics:
* Must be usable from pages not themselves bound to an appcache, as long as
they are served from the same origin as the caches being operated on.
* Must work from workers, shared workers, and background pages, again
subject to a same origin check.

The above is a very rough sketch, and needs a bunch of work, but illustrates
the features we'd find useful.  An obvious flaw is that it doesn't fit in
with the system of progress events etc on the current API, but there are
probably many others.  View it mainly as a list of requirements.  Our use
cases are as follows:

* Docs maintains a set of appcaches which it uses for various purposes.
 Each editor, for example, has a cache.  There are also cases where
different documents require different versions of the same editor.
* The set of caches required on a particular browser depends on the
documents synced there.  A given set of documents will require a particular
(much smaller) set of caches to open.  The set of caches required on a given
browser is therefore dynamic, changing as documents enter and leave the set
of those synchronized.
* Each time anybody opens a docs property, and perhaps during the lifetimes
of some of them, we perform a procedure called 'appcache maintenance', which
ensures that the caches necessary for the current set of documents are
synced.  This is a fairly nasty process involving many iframes, but it
works.  We would like, however, to make this code much simpler, not have it
involve the iframes, and make the process of piping progress events back to
the host application less awful.  Right now it's such a pain we're not
bothering with it.
* We'd like to perform appcache maintenance on existing caches less often,
reducing server load.  The timestamps included above would allow us to do
that.
* When an appcache is no longer needed by the current set of documents, it
is currently just left there.  We would like to be able to clean it up.
* We would like to be able to perform our appcache maintenance procedure
from a shared worker, as we have one that can bring new documents into
storage.  Right now that is 

Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Ian Hickson

On the subject of diagnostics for appcache:

On Wed, 8 Jun 2011, Patrick Mueller wrote:
 On Wed, Jun 8, 2011 at 15:21, Ian Hickson i...@hixie.ch wrote:
  On Tue, 1 Feb 2011, Patrick Mueller wrote:
  
   I just tested Chrome beta this morning and saw nothing interesting 
   in appcache error events, however progress events have now grown 
   loaded and total properties (think those were the names, and I 
   think they're new-ish).  That's nice, as I can provide a progress 
   meter during cache load/reload.  I wouldn't mind having the URL of 
   the resource being loaded (that was just loaded?) as well as those 
   numbers.  And for errors it would be nice to know, in the case of an 
   error caused by a cache manifest entry 404'ing (or otherwise 
   unavailable), what URL it was. HTTP error code, if appropriate, etc.
 
  In theory, we don't want to expose this information because it can be 
  used to introspect intranets.
 
 I never considered that introspect internets angle.  I guess the 
 thought is that a rogue site could send a manifest with pointers to 
 files inside someone's intranet, and then get someone inside that 
 intranet to load that manifest, at which point JavaScript could have 
 access to which URLs returned 200's, etc.  Nasty.

Right.


 Is this just an issue if the manifest or originating document's origin 
 is different than a file listed in the manifest itself?  Perhaps errors 
 on these entries would less diagnostic data available for them - perhaps 
 no diagnostic data.  That would kind of fit with other cross-origin 
 access capabilities.

That might work.


  What kind of information would be most useful? Should it be in the 
  same format from every browser or should it be detailed and freeform?
 
 Start with URL, because we know a URL was involved.  Then allow for an 
 optional vendor-specific freeform message.

Vendor-specific messages end up being parsed by scripts, and shortly after 
that we end up having to hard-code those messages as the spec.

So I'd rather not add a freeform message!

What is the URL for? Can you describe the way this information would be 
used in a user interface or however it would be used?

I'm just trying to make sure we address the actual problems that need 
addressing.


Regarding TLS and cross-origin requests:

On Thu, 16 Jun 2011, Michael Nordman wrote:
  On Tue, 8 Feb 2011, Michael Nordman wrote:
  
   Just had an offline discussion about this and I think the answer can 
   be much simpler than what's been proposed so far.  All we have to do 
   for cross-origin HTTPS resources is respect the cache-control 
   no-store header.
  
   Let me explain the rationale... first let's back up to the 
   motivation for the restrictions on HTTPS. They're there to defeat 
   attacks that involve physical access the the client system, so the 
   attacker cannot look at the cross-origin HTTS data stored in the 
   appcache on disk. But the regular disk cache stores HTTPS data 
   provided the cache-control header doesn't say no-store, so excluding 
   this data from appcaching does nothing to defeat that attack.
  
   Maybe the spec changes to make are...
  
   1) Examine the cache-control header for all cross-origin resources 
   (not just HTTPS), and only allow them if they don't contain the 
   no-store directive.
  
   2) Remove the special-case restriction that is currently in place 
   only for HTTPS cross-origin resources.
 
  On Wed, 30 Mar 2011, Michael Nordman wrote:
  
   Fyi: This change has been made in chrome.
   * respect no-store headers for cross-origin resources (only for 
   HTTPS)
   * allow HTTPS cross-origin resources to be listed in manifest hosted 
   on HTTPS
 
  This seems reasonable. Done.
 
 I had proposed respecting the no-store directive only for cross-origin 
 resources. The current draft is examining the no-store directive for 
 all resources without regard for their origin. The intent behind the 
 proposed change was to allow authors to continue to override the 
 no-store header for resources in their origin, and to disallow that 
 override only for cross-origin resources. The proposed change is less 
 likely to break existing apps, and I think there are valid use cases for 
 the existing behavior where no-store can be overriden by explicit 
 inclusion in an appcache.

I guess we can restrict no-store to cross-origin HTTPS resources, but it 
seems far easier to explain that no-store in general is honoured. 
Otherwise you end up with these weird situations where some resources can 
be cached and some can't, and the only reason one can or can't be stored 
is where the manifest is, but only if it has no-store, etc... It gets 
rather confusing.

Also, what use cases are there for specifying no-store that don't apply 
across all resources?



On the topic of appcache being used to cache everything but the main page:

On Wed, 29 Jun 2011, Felix Halim wrote:
 On Thu, Jun 9, 2011 at 3:21 AM, Ian Hickson i...@hixie.ch wrote:
  If 

Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Michael Nordman
 A common request that maybe we can agree upon is the ability to list the
 manifests that are cached and to delete them via script. Something
like...
   String[] window.applicationCache.getManifests();  // returns appcache
 manifest for the origin
   void window.applicationCache.deleteManifest(manifestUrl);

 This is trivial to do already; just return 404s for all the manifests you
 no longer want to keep around.

It involves creating hidden iframes loaded with pages that refer to the
manifests to be deleted, straightforward but gunky.

 0. [DONE] A means of not invoking the fallback resource for some error
 responses that would generally result in the fallback resource being
 returned. An additional response header would suite they're needs...
 something like...
 x-chromium-appcache-fallback-override: disallow-fallback
 If a response header is present with that value, the fallback response
would
 not be returned.
 http://code.google.com/p/chromium/issues/detail?id=82066

 What's the use case? When would you ever want to show the user an error
 yet really desire to indicate that it's an error and not a 200 OK
response?

Google Docs. Instead of seeing a fallback page that erroneously says You
must be offline and this document is not available., they wanted to show
the actual error page generated by the server in the case of a deleted
document or when the user doesn't have rights to access that doc.


Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Ian Hickson
On Tue, 2 Aug 2011, Michael Nordman wrote:
 
  A common request that maybe we can agree upon is the ability to list the
  manifests that are cached and to delete them via script. Something
  like...
String[] window.applicationCache.getManifests();  // returns appcache
  manifest for the origin
void window.applicationCache.deleteManifest(manifestUrl);
 
  This is trivial to do already; just return 404s for all the manifests 
  you no longer want to keep around.
 
 It involves creating hidden iframes loaded with pages that refer to the 
 manifests to be deleted, straightforward but gunky.

If you actively want to seek out old manifests, sure, but what's the use 
case for doing that? It would be like trying to actively evict things from 
HTTP caches.


  0. [DONE] A means of not invoking the fallback resource for some error
  responses that would generally result in the fallback resource being
  returned. An additional response header would suite they're needs...
  something like...
  x-chromium-appcache-fallback-override: disallow-fallback
  If a response header is present with that value, the fallback response
  would not be returned. 
  http://code.google.com/p/chromium/issues/detail?id=82066
 
  What's the use case? When would you ever want to show the user an 
  error yet really desire to indicate that it's an error and not a 200 
  OK response?
 
 Google Docs. Instead of seeing a fallback page that erroneously says 
 You must be offline and this document is not available., they wanted 
 to show the actual error page generated by the server in the case of a 
 deleted document or when the user doesn't have rights to access that 
 doc.

I don't see what's wrong with using 200 OK for that case.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Michael Nordman
On Tue, Aug 2, 2011 at 4:40 PM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 2 Aug 2011, Michael Nordman wrote:
  
   A common request that maybe we can agree upon is the ability to list
 the
   manifests that are cached and to delete them via script. Something
   like...
 String[] window.applicationCache.getManifests();  // returns
 appcache
   manifest for the origin
 void window.applicationCache.deleteManifest(manifestUrl);
  
   This is trivial to do already; just return 404s for all the manifests
   you no longer want to keep around.
 
  It involves creating hidden iframes loaded with pages that refer to the
  manifests to be deleted, straightforward but gunky.

 If you actively want to seek out old manifests, sure, but what's the use
 case for doing that? It would be like trying to actively evict things from
 HTTP caches.


   0. [DONE] A means of not invoking the fallback resource for some error
   responses that would generally result in the fallback resource being
   returned. An additional response header would suite they're needs...
   something like...
   x-chromium-appcache-fallback-override: disallow-fallback
   If a response header is present with that value, the fallback response
   would not be returned.
   http://code.google.com/p/chromium/issues/detail?id=82066
  
   What's the use case? When would you ever want to show the user an
   error yet really desire to indicate that it's an error and not a 200
   OK response?
 
  Google Docs. Instead of seeing a fallback page that erroneously says
  You must be offline and this document is not available., they wanted
  to show the actual error page generated by the server in the case of a
  deleted document or when the user doesn't have rights to access that
  doc.

 I don't see what's wrong with using 200 OK for that case.


You should talk to the app developers. I think there are other consumers of
these urls besides the browser. To change the status code to 200 would break
those other consumers.



 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Michael Nordman
On Tue, Aug 2, 2011 at 4:40 PM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 2 Aug 2011, Michael Nordman wrote:
  
   A common request that maybe we can agree upon is the ability to list
 the
   manifests that are cached and to delete them via script. Something
   like...
 String[] window.applicationCache.getManifests();  // returns
 appcache
   manifest for the origin
 void window.applicationCache.deleteManifest(manifestUrl);
  
   This is trivial to do already; just return 404s for all the manifests
   you no longer want to keep around.
 
  It involves creating hidden iframes loaded with pages that refer to the
  manifests to be deleted, straightforward but gunky.

 If you actively want to seek out old manifests, sure, but what's the use
 case for doing that? It would be like trying to actively evict things from
 HTTP caches.


You should talk to some app developers. View source on angry birds for a use
case, they are doing this to get rid of stale version tied to old manifest
urls.




   0. [DONE] A means of not invoking the fallback resource for some error
   responses that would generally result in the fallback resource being
   returned. An additional response header would suite they're needs...
   something like...
   x-chromium-appcache-fallback-override: disallow-fallback
   If a response header is present with that value, the fallback response
   would not be returned.
   http://code.google.com/p/chromium/issues/detail?id=82066
  
   What's the use case? When would you ever want to show the user an
   error yet really desire to indicate that it's an error and not a 200
   OK response?
 
  Google Docs. Instead of seeing a fallback page that erroneously says
  You must be offline and this document is not available., they wanted
  to show the actual error page generated by the server in the case of a
  deleted document or when the user doesn't have rights to access that
  doc.

 I don't see what's wrong with using 200 OK for that case.

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Ian Hickson
On Tue, 2 Aug 2011, Michael Nordman wrote:
 
  If you actively want to seek out old manifests, sure, but what's the 
  use case for doing that? It would be like trying to actively evict 
  things from HTTP caches.
 
 You should talk to some app developers. View source on angry birds for a 
 use case, they are doing this to get rid of stale version tied to old 
 manifest urls.

But why?

I couldn't figure out the use case from the source you mention.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] AppCache-related e-mails

2011-08-02 Thread Michael Nordman

 On Mon, 13 Jun 2011, Michael Nordman wrote:
 
  Let's say there's a page in the cache to be used as a fallback resource,
  refers to the manifest by relative url...
 
  html manifest='x'
 
  Depending on the url that invokes the fallback resource, 'x' will be
  resolved to different absolute urls. When it doesn't match the actual
  manifest url, the fallback resource will get tagged as FOREIGN and will
  no longer be used to satisfy main resource loads.
 
  I'm not sure if this is a bug in chrome or a bug in the appcache spec
  just yet. I'm pretty certain that Safari will have the same behavior as
  chrome in this respect (the same bug). The value of the manifest
  attribute is interpreted as relative to the location of the loaded
  document in chrome and all webkit based browsers and that value is used
  to detect foreign'ness.
 
  The workaround/solution for this is to NOT put a manifest attribute in
  the html tag of the fallback resource (or to put either an absolute
  url or host relative url as the manifest attribute value).

 Or just make sure you always use relative URLs, even in the manifest.

 I don't really understand the problem here. Can you elaborate further?


Suppose the fallback resource is setup like this...

FALLBACK:
/ FallbackPage.html

... and that page contains a relative link to the manifest in its
html tag like so...
html manifest=file.manifest

Any server request that fails under / will get FallbackPage.html in
response. For example...

/SomePage.html

When the fallback is used in this case the manifest url will be
interpreted as /file.manifest

/Some/Other/Page.html

And in this case the manifest url will be interpreted as
/Some/Other/file.manifest


On Fri, 1 Jul 2011, Michael Nordman wrote:
 
  Cross-origin resources listed in the CACHE section aren't retrieved with
  the 'Origin' header

 This is incorrect. They are fetched with the origin of the manifest. What
 makes you say no Origin header is included?


I don't see mention of that in the draft? If that were the case then this
wouldn't be an issue.

I'm not familiar with CORS usage. Do xorigin subresource loads of all kinds
(.js, .css, .png) carry the Origin header?

I can imagine a server implementation that would examine the Origin header
upfront, and if it didn't like what it saw, instead of computing the
response without the origin listed in the Access-Control-Allow-Origin
response header... it just wouldn't compute the response body and return an
empty response without the origin listed in the Access-Control-Allow-Origin
response header.

If general subresource loads aren't sent with the Origin header, fetching
all manifest listed resource with that header set could cause problems.


Re: [whatwg] AppCache-related e-mails

2011-07-12 Thread Karl Dubost

Le 29 juin 2011 à 05:27, Felix Halim a écrit :
 Suppose the content of the main page change very often (like news site).
 In this case, you don't want to cache the main page since the users
 want to see the latest main page, not the cached ones when they open
 the main page later.

Did you also check ESI?
http://www.w3.org/TR/esi-lang

For example in 
http://symfony.com/doc/2.0/book/http_cache.html#edge-side-includes

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations  Tools, Opera Software



Re: [whatwg] AppCache-related e-mails

2011-07-07 Thread Bjartur Thorlacius

Þann fim  7.júl 2011 05:30, skrifaði Felix Halim:

On Thu, Jul 7, 2011 at 3:57 AM, Karl Dubostka...@opera.com  wrote:
http://uhunt.felix-halim.net/id/339

I'll look into your site when I've slept, but FYI, you're mandated to 
provide a title for your document. You should probably provide a title 
of uHunt, and append to the title's innerHTML as further information 
becomes available. [/nitpick]


Re: [whatwg] AppCache-related e-mails

2011-07-06 Thread Karl Dubost
Felix,

Le 29 juin 2011 à 05:27, Felix Halim a écrit :
 Suppose the content of the main page change very often (like news site).
 In this case, you don't want to cache the main page since the users
 want to see the latest main page, not the cached ones when they open
 the main page later.

Is there a web site which exhibits exactly the issue you are mentioning. 
Or could you set up a mini Web site exhibiting the issue. I have read the full 
thread, and I still do not understand what you are trying to solve. HTTP cache 
is about setting user interactions. There is no good values, just the values 
you decide that would make sense. 

HTTP Cache can already handle a lot of cases (offline/online) without even 
using AppCache, specifically when it is only content. 


-- 
Karl Dubost - http://dev.opera.com/
Developer Relations  Tools, Opera Software



Re: [whatwg] AppCache-related e-mails

2011-07-06 Thread Felix Halim
On Thu, Jul 7, 2011 at 3:57 AM, Karl Dubost ka...@opera.com wrote:
 Felix,

 Le 29 juin 2011 à 05:27, Felix Halim a écrit :
 Suppose the content of the main page change very often (like news site).
 In this case, you don't want to cache the main page since the users
 want to see the latest main page, not the cached ones when they open
 the main page later.

 Is there a web site which exhibits exactly the issue you are mentioning.
 Or could you set up a mini Web site exhibiting the issue. I have read the 
 full thread, and I still do not understand what you are trying to solve. HTTP 
 cache is about setting user interactions. There is no good values, just the 
 values you decide that would make sense.

 HTTP Cache can already handle a lot of cases (offline/online) without even 
 using AppCache, specifically when it is only content.

This is a real example. I build a site like:

http://uhunt.felix-halim.net/id/339

That is is mine, and there is another ids like:

http://uhunt.felix-halim.net/id/32900
http://uhunt.felix-halim.net/id/1133

And thousands of other IDs.
Usually people look into few dozens IDs and not all thousands of them.

Each ID has a large-unique-frequently-changing data attached to them
(about 400KB).
Obviously, if I do a clean separation, and store the static framework
in AppCache, and the frequently changing data in localStorage, I can
only cache 10 ids data or so.

What I want is a 5MB pageStorage quota per page id. So that I can
store the frequently changing data to it rather than the shared
localStorage which uses the 5MB domain quota. In this case, any users
can essentially view a lot more ids without having to worry exceeding
the localStorage quota as long as I know that per page takes far less
than 5MB.

Of course I can implement my own cache revocation like deleting from
localStorage for ids that are less viewed. But this job is best left
to the browsers. Browsers can remove any page that is not viewed
anymore and the pageStorage associated to it.

Is that clear enough? on why we need pageStorage?

Now the problem is, how do you use AppCache + pageStorage?
They are conflicting each other in terms of URL.

I can use AppCache to cache the static framework I have to URL like:

http://uhunt.felix-halim.net/id

Then a pageStorage can be created for each different hashbang:

http://uhunt.felix-halim.net/id#339

That will give me 5MB for id = 339

And:

http://uhunt.felix-halim.net/id#32900

That will give me ANOTHER 5MB for id = 32900

and so on.

Then the browser can decide which URL are less frequently accessed and
destroy the pageStorage associated to it if the browser has no space
left.

Even if my script is malicious and I create unlimited number of
hashbang to get unlimited quota, the browser can just remove and store
only let say 100 latest or most frequently used hashbang. So, should
be perfectly fine to have pageStorage attached to a hashbang value.

This will help in web application developers to cleanly separates the
static from the dynamic and have nothing to worry about managing their
cache replacement policies, or worry about the limitation of 5MB of
localStorage or any other storage! This will also help the browser to
dissect what's static and what's dynamic! It's a WIN-WIN strategy for
all.


I think everybody knows that if I directly use AppCache to this url:

http://uhunt.felix-halim.net/id/339

What will happen?
I will have to refresh twice to get the latest statistics of my page!

Now, if somehow AppCache can make the main page ONLINE (that is, so
that I don't need to refresh twice). Then, all the discussions of
pageStorage above and quota becomes meaningless!



So, my proposals is either to make the main page of the AppCache
ONLINE, or support pageStorage for hashbangs.

Do you have suggestions on this?

Felix Halim


Re: [whatwg] AppCache-related e-mails

2011-07-06 Thread Felix Halim
On Thu, Jul 7, 2011 at 1:30 PM, Felix Halim felix.ha...@gmail.com wrote:
 So, my proposals is either to make the main page of the AppCache
 ONLINE, or support pageStorage for hashbangs.

Now when I think about the pageStorage again, actually we don't need hashbangs!

We can just say:

pageStorage['339'] = { here is my 5 MB JSON data for 339 }
pageStorage['32900'] = { here is my 5 MB JSON data for 32900 }


That should perfectly works well and the browser can silently destroy
the content of any of the less used ID, ANYTIME.

So, the usage is to not always assume the content of pageStorage
exists and treat it purely as cache that can be gone at anytime.

So, yes, we can use pageStorage to any page associated to the page URL
without the hashbang (as if the hashbang is stripped off). The quota
per key/value pair is 5 MB and can be removed by browser anytime.

How about that?

This can fulfil my need to get a clean separation.

Felix Halim


Re: [whatwg] AppCache-related e-mails

2011-07-03 Thread timeless
If you have 100mb of junk, it won't fit in my browser's http cache
either. And that's a good thing, there are other sites I visit that
are more important. However, a browser is within its rights to detect
that its user uses a site so heavily as to justify increasing that
site's cache allocation. That's a QoI detail.

Note that if 70% of your 100mb is duplicated framework, it's possible
that a better implementation of your site could fit into a 25mb
cache...


Re: [whatwg] AppCache-related e-mails

2011-07-03 Thread Felix Halim
On Sun, Jul 3, 2011 at 3:17 PM, timeless timel...@gmail.com wrote:
 If you have 100mb of junk, it won't fit in my browser's http cache
 either. And that's a good thing, there are other sites I visit that
 are more important. However, a browser is within its rights to detect
 that its user uses a site so heavily as to justify increasing that
 site's cache allocation. That's a QoI detail.

Yes, the quota system can be improved heuristically.

 Note that if 70% of your 100mb is duplicated framework, it's possible
 that a better implementation of your site could fit into a 25mb
 cache...

The only way to remove the duplication is to use a single URL for the
web app (then apply AppCache to that URL), and uses shebang to
uniquely identify the page:

http://bla/page#!id=10

Then all the non-duplicated data are stored in localStorage/indexedDB.

I would love this, however as long as the quota system is still 5MB
(or an equivalent page storage quota), I don't feel inclined to make
that changes to my site yet. The browsers vendors have to move first
to design better quota systems.

Or is there any other way to cleanly do the separation without shebang?

FYI, I want my page to be able to be linked (referenced / bookmarked)
from other sites.

Btw, does anyone know why Facebook abandoned the usage of shebang?

Felix Halim


Re: [whatwg] AppCache-related e-mails

2011-07-03 Thread Nils Dagsson Moskopp
Felix Halim felix.ha...@gmail.com schrieb am Sun, 3 Jul 2011 15:41:54
+0800:

 Btw, does anyone know why Facebook abandoned the usage of shebang?

If they did so, then rightly so. Hashbangs are a thouroughly bad idea:
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


Re: [whatwg] AppCache-related e-mails

2011-07-03 Thread Felix Halim
On Sun, Jul 3, 2011 at 8:21 PM, Nils Dagsson Moskopp
n...@dieweltistgarnichtso.net wrote:
 Felix Halim felix.ha...@gmail.com schrieb am Sun, 3 Jul 2011 15:41:54
 Btw, does anyone know why Facebook abandoned the usage of shebang?

 If they did so, then rightly so. Hashbangs are a thouroughly bad idea:
 http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs

After reading that article, it seemed that re-structuring the website
to separate the static from dynamic doesn't worth much afterall,
unless there is other method than hash-bang. The follow up question is
that: is there another way to achieve a clean separation without using
hash-bang?

Another question is that how do we bookmark/link AppCached pages
without using hash-bang? I cannot find discussions about it.

Felix Halim


Re: [whatwg] AppCache-related e-mails

2011-07-03 Thread Nils Dagsson Moskopp
Felix Halim felix.ha...@gmail.com schrieb am Sun, 3 Jul 2011 23:16:16
+0800:

 […]

 After reading that article, it seemed that re-structuring the website
 to separate the static from dynamic doesn't worth much afterall,
 unless there is other method than hash-bang. The follow up question is
 that: is there another way to achieve a clean separation without using
 hash-bang?

You may be looking for this:
http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html?

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


Re: [whatwg] AppCache-related e-mails

2011-07-02 Thread Felix Halim
On Sat, Jul 2, 2011 at 8:14 AM, Bjartur Thorlacius svartma...@gmail.com wrote:
 Şann fös  1.júl 2011 03:22, skrifaği Felix Halim:

 I'm looking for a solution that doesn't require modifying anything
 except adding a manifest.

 I recommend fixing your website. As others have stated, this has practical
 benefits, in the online as well as the offline case.

I don't mind fixing my website, if I really have to! If AppCache have
an option to always view the main page online, I won't have to do
anything.


 however, if we don't have pageStorage, even we have a clean dynamic
 separation, it will quickly run out of space if we use localStorage
 since the localStorage quota is per domain.

 Nobody's forcing you to use localStorage. How do you figure using
 pageStorage or localStorage will be less work than using iframes or other
 linking methods already proposed?

It's not about the amount of work that matters, it's the quota I'm
talking about.


 Let's see an example:

 I have a dynamic page with this url:

 http://bla/page?id=10

 The content inside is changing very frequently, lets say every hour.
 Of course, I want the browser to cache the latest version.

 Then specify the applicable HTTP headers with informative values. HTTP
 caching hasn't stopped working, nor is it barred from improving. There is
 space for implementations to improve while complying with current
 specifications. All you have to do is split dynamic resources from static,
 read the RFC and send the appropriate headers.
 Of course this method has the drawback of requiring a request/response pair
 for every resource transferred over HTTP.

Remember that I also want those URL to be available even if the user is offline.
HTTP Cache is not that powerful, AppCache is.


 In that case, my cleanly separated static and dynamic will have no effect!
 Because all the statics get duplicated for each App Cache.
 It will be the same as if I don't have the framework!

 I'm not following your line of thinking. Why do you insist on using an App
 Cache for each page rather than a shared cache for all your resources?

I do want to use shared cache for shared resources and page cache
for non-shared resources (unique to that page). However, the
non-shared resources will become too large to fit in 5MB quota.
Remember I have different non-shared content for id=10, id=11, ...,
id=10, I don't think that will fit in localStorage.


 Are you certain that users wish to archive every single dynamic resource
 they fetch from your site? Disposition of any significant amount of storage
 should be in the hands of the user, if indirectly through the user agent.
 Take handhelds.

Users only view the resources they want.

When they viewed it, I want it to be there for offline use or for
performance reasons.
I expect the users only view (and cache) few hundreds of them.
They cannot cache what they didn't view / open.
It is OK for the browser to not cache it if it doesn't any storage left.

I am satisfied if there is a page storage quota of 5MB given per
page (not per domain).
This will solve all my problems (of course by restructuring my site).



 If only I can store the dynamic content into a pageStorage (assuming
 different URL -  including the shebang bookmark has different
 pageStorage), then I won't be running out of storage if I keep one
 page within 5MB. So

 And you're sure this is a good thing, because?

Because currently, browsers can handle a page content  5MB very fast.
I think it is OK for a page (not a domain) to have 5MB data quota.

If you are building games, perhaps need more than that (it has to go
to the web store to get unlimited permission). However, for regular
pages, 5MB currently is more than enough. 5MB per domain is too
small!


 http://bla/page#!id=10

 You *can't* allocate a quota per URI fragment, as a script in the page could
 create new ones as wanted.

Yes I know, that was only for an example to point out that:

If I use shared cache:

http://bla/page

I will run out of quota quickly.

If I use parameters like this:

http://bla/page?id=10

I will have to refresh TWICE to get the latest content (annoying).

If I can use:

http://bla/page#!id=10

I get the best of both worlds, that is I have shared static cache, and
I won't run out of quota for the non-shared-dynamic cache since the
quota is 5MB per hash value. I know that this has a security hole that
the script can just generate random url to get more quota.

My suggestion is to give quota to hash value for the first time the
page is loaded, so a later script modification will be linked to the
original hash value's quota.


 So, we have seen how the AppCache fails to satisfy certain usecase and
 how pageStorage is needed to make the alternative solution works.

 Show how either the HTTP specification or common practice forbids HTTP
 caches from satisfy your use cases.

I think it's clear that HTTP Cache is inferior to AppCache.
What HTTP Cache can, it can be overridden by AppCache.
AppCache 

Re: [whatwg] AppCache-related e-mails

2011-07-01 Thread Michael Nordman
A common request that maybe we can agree upon is the ability to list the
manifests that are cached and to delete them via script. Something like...
  String[] window.applicationCache.getManifests();  // returns appcache
manifest for the origin
  void window.applicationCache.deleteManifest(manifestUrl);

I think it's clear from this discussion (and others) that the overall
appcache feature set leaves something to be desired, but it's less clear how
to best satisfy the desirements. Until there is some clarity, it's hard to
see how the community is going to make progress. Personally, I think whats
needed to move things forward is for browser vendors to do some independent
innovating to see what works and what doesn't work.

 @Hixie... any idea when the appcache feature set will be up for a growth
 spurt? I think there's an appetite for another round of features in the
 offline app developers that i communicate with. There's been some recent
 interest here in pursuing a means of programatically producing a
 response instead of just returning static content.

 Who implements it currently? Is there a test suite? Those are the main
 things that would gate a dramatic addition of new features.

Well, nobody yet; but I have a roadmap in mind that builds up to that. Much
of the discussion in this thread has been on the second item. Mobile
developers are particularly interested 2 to avoid HTTP cache churn and the
cost of HTTP cache validation. In this roadmap, you can see that it would
also allow pages vended from servers to make use of executable intercept
handlers.


-1. [DONE] Support for cross-origin HTTPS resources.
http://code.google.com/p/chromium/issues/detail?id=69594

0. [DONE] A means of not invoking the fallback resource for some error
responses that would generally result in the fallback resource being
returned. An additional response header would suite they're needs...
something like...
x-chromium-appcache-fallback-override: disallow-fallback
If a response header is present with that value, the fallback response would
not be returned.
http://code.google.com/p/chromium/issues/detail?id=82066

1. [UNDER CONFUSING DISCUSSION] Allow a syntax to associate a page with an
application cache, but does
not add that page to the cache. A common feature request also mentioned on
the whatwg list, but it's not getting any engagement from other browser
vendors or the spec writer (which is kind of frustrating). The premise is to
allow pages vended from a server to take advantage of the resources in an
application cache when loading subresources. A perfectly reasonable
request, http useManifest='x'.

2. Introduce a new manifest file section to INTERCEPT requests into a prefix
matched url namespace and satisfy them with a cached resource. The resulting
page would be free to interpret the location url and act accordingly based
on the path and query elements beyond the prefix matched url string. This
section would be similar to the FALLBACK section in that prefix matching is
involved, but different in that instead of being used only in the case of a
network/server error, the cached INTERCEPT resource would be used
immediately w/o first going to the server.
  INTERCEPT:
  urlprefix redirect newlocationurl
  urlprefix return cachedresourceurl

Here's where the INTERCEPT namespace could fit into the changes to the
network model.
   if (url is EXPLICITLY_CACHED)  // exact match
 return cached_response;
   if (url is in NETWORK namespace) // prefix match
 return network_response_as_usual;
   if (url is in INTERCEPT namespace) // prefix match  this is the new
section
 return handle_intercepted_request_accordingly
   if (url is in FALLBACK namespace) // prefix match
 return network_response_but_fallback_where_needed;
   if (ONLINE_WILDCARD)
 return network_response;
   otherwise
 return synthesized_error_response;

3. Allow an INTERCEPT cached resources to be executable. Instead of simply
returning the cached resource or redirect in response to the request, load
it into a
background worker context (if not already loaded) and invoke a function in
that context to asynchronously compute response headers and body based on
the request headers (including cookie) and body. The background worker would
have access to various local storage facilities (fileSystem, indexed/sqlDBs)
as well as the ability to make network requests via XHR.
  INTERCEPT:
  urlprefix execute cachedexecutableresourceurl

4. Create a syntax to allow FALLBACK resources to be similarly executable in
a background worker context.

5. Some kind of auto-update policy where the appcache is refreshed w/o the
app running.

There are a couple of features that are not on this list that I want to call
out:

* The ability to add(url) and remove(url) the appcache is not on the list.
FileSystem urls cover a lot of this already, and the ability to cache adhoc
resources and later load them via http urls could be composed out of the
filesystem and 

Re: [whatwg] AppCache-related e-mails

2011-07-01 Thread Bjartur Thorlacius

Þann fös  1.júl 2011 03:22, skrifaði Felix Halim:

I'm looking for a solution that doesn't require modifying anything
except adding a manifest.

I recommend fixing your website. As others have stated, this has 
practical benefits, in the online as well as the offline case.



As I said before, separating dynamic from the static will work,

Great!


however, if we don't have pageStorage, even we have a clean dynamic
separation, it will quickly run out of space if we use localStorage
since the localStorage quota is per domain.

Nobody's forcing you to use localStorage. How do you figure using 
pageStorage or localStorage will be less work than using iframes or 
other linking methods already proposed?



Let's see an example:

I have a dynamic page with this url:

http://bla/page?id=10

The content inside is changing very frequently, lets say every hour.
Of course, I want the browser to cache the latest version.
Then specify the applicable HTTP headers with informative values. HTTP 
caching hasn't stopped working, nor is it barred from improving. There 
is space for implementations to improve while complying with current 
specifications. All you have to do is split dynamic resources from 
static, read the RFC and send the appropriate headers.
Of course this method has the drawback of requiring a request/response 
pair for every resource transferred over HTTP.



So, it seemed that AppCache is a perfect fit...

AppCache is no magic bullet. Don't use it if you figure it isn't a 
perfect fit.



I then add the manifest to enable the App Cache, and what do I get?

Everytime I open that URL every hour, I ALWAYS see the STALE version
(the 1 hour late version). Then few seconds (or minutes) later (depend
on when the AppCache gets updated), I refresh, then I got the latest
content. Annoying, right?

FYI, HTTP has already resolved this issue, by forbidding implementations 
from returning a stale version by default under normal situations or 
without warning



In this case, I better off NOT to use App Cache, since it brings the
old content everytime.


Right. Bad App Cache.


Now, let see the alternative: I build a framework to separate the
dynamic from the static.
I have to make it so that only ONE MAIN PAGE get cached by the app cache.
So, my URL can NO LONGER BE:

http://bla/page?id=10

But it has to change to:

http://bla/page#!id=10

Why do I have to do this? it's because if I DON'T, then each page will
be stored on different App Cache, and the stale by one still occurs!
That is,

http://bla/page?id=10

and

http://bla/page?id=11

Will be on DIFFERENT AppCache!

In that case, my cleanly separated static and dynamic will have no effect!
Because all the statics get duplicated for each App Cache.
It will be the same as if I don't have the framework!

I'm not following your line of thinking. Why do you insist on using an 
App Cache for each page rather than a shared cache for all your resources?



So, to make the AppCache only cache one static framework, I have to
make my page such that it is served under ONE url:

http://bla/page

Then take the #!id=10 as non url (or ajax bookmark). This way, the
AppCache will only cache ONE of my static framework, and MANY dynamic
content inside it.

Guess what? All the incoming links from other blogs are now broken!
Of course I can make a redirect, but redirect is AGAINST making the web faster!

I think Facebook did the #! thing a while ago, then they abandoned it, why?

Ok now I'm happy with my framework and the redirect, and guess what?
Soon, I have other pages with #!id=11, #!id=12, ...,  #!id=1.
All of them are important and I wan't to cache them and I uses the
localStorage (or indexedDB) to cache the dynamic content of those
pages.
Note that even though the dynamic content is dynamic it doesn't mean that:

http://bla/page?id=10

has shared data with

http://bla/page?id=11

It can be totally different unrelated dynamic content. id=10 dynamic
content is entirely different from id=11 dynamic content. However,
since I use localStorage to cache the dynamic content, ALL OF THEM are
limited to the quota of my domain. My 5MB localStorage domain quota
will quickly run out of space.

Are you certain that users wish to archive every single dynamic resource 
they fetch from your site? Disposition of any significant amount of 
storage should be in the hands of the user, if indirectly through the 
user agent. Take handhelds.



If only I can store the dynamic content into a pageStorage (assuming
different URL -  including the shebang bookmark has different
pageStorage), then I won't be running out of storage if I keep one
page within 5MB. So

And you're sure this is a good thing, because?


http://bla/page#!id=10

You *can't* allocate a quota per URI fragment, as a script in the page 
could create new ones as wanted.



Then I would be very happy with the new framework.
Since it will store very compact static App and very compact dynamic content.
It's a win win for everyone, nothing 

Re: [whatwg] AppCache-related e-mails

2011-06-30 Thread Bjartur Thorlacius
Ask HTTP implementors to store a potentially stale fallback copy for
offline use when an authoritative copy is unavailable. Even HTTP
caches are allowed to return stale responses as long as they warn
their clients (so they can warn their clients or fetch an
authoritative copy via another route).
Browsers should keep copies of the most used entries for offline use.
It's probably a matter of minor tweaking, considering that mainstream
browsers support offline modes already.

From http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.1.5:
In some cases, the operator of a cache MAY choose to configure it to
return stale responses even when not requested by clients. This
decision ought not be made lightly, but may be necessary for reasons
of availability or performance, especially when the cache is poorly
connected to the origin server. Whenever a cache returns a stale
response, it MUST mark it as such (using a Warning header) enabling
the client software to alert the user that there might be a potential
problem.

P.S. Your hypothetical major overhaul should probably involve
splitting the dynamic content into separate resources linked to from a
static main page/index using iframes.


Re: [whatwg] AppCache-related e-mails

2011-06-30 Thread timeless
It's possible to build a main page so that it can update its content
using a subresource. You can use iframes, javascript (including json),
xmlhttprequests, or other things to do this.

Nothing requires you to have a monolythic main page which is incapable
of dynamically updating itself. ... If I visit your page on May 1st
and sit there for two months, does your page really just want to
continue to show me the same content when I glance at it on July 1st?
It can show other content if it wants to, and in order to save
bandwidth costs, it should avoid resending the framework which
shouldn't be changing. Once your page works well for this case, it
should work well for app-cache.

On 6/29/11, Felix Halim felix.ha...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 3:21 AM, Ian Hickson i...@hixie.ch wrote:
 If you're not loading the main page from the cache, what does this gain
 you that regular HTTP caching doesn't?

 Suppose the content of the main page change very often (like news site).
 In this case, you don't want to cache the main page since the users
 want to see the latest main page, not the cached ones when they open
 the main page later.
 However, should the network connectivity is down, the user should be
 presented with the cached main page.

 This problem can be solved by having the main page to NOT include the
 news content, but only a static template.
 The news content is fetched dynamically through XHR and stored in
 localStorage.
 However, this complicates the news site (a major redesign of the
 website is necessary).

 It would be far easier if there is an option in the MANIFEST file to
 NOT CACHE the main page.
 So that the behavior is exactly like caching, but it is far stronger,
 since the rest of the resources (css, js, images, etc... are never
 re-fetched from the network).
 The current HTTP Caching still checks whether the resources are
 modified, but in app cache, we can explicitly say that they are not
 modified unless we change the manifest hash.

 So, in this case, HTML5 App Cache can help make regular online
 websites far faster, as well as provide offline access should the
 network is down (or the server is down).
 This would make the online news site feels online when it's online and
 offline when it's offline. I don't think HTTP Cache can serve the
 content if the network / server is down.

 If the main page is always cached, then the next time the user visits
 the main page, it will (almost) always see the STALE content of the
 main page.
 Then a split second later, the main page refreshes with the most
 up-to-date version, which is very annoying to the users.


 On Mon, 14 Feb 2011, Felix Halim wrote:

 I have a use case where it is preferable that the main page is not
 cached:

 Suppose you have a main page that changes based on it's ID:

 http://example.com/page.php?id=10

 The appCache will store each main page with different id in separate
 cache, which is undesirable! And we DON'T want to cache the main pages,
 since the content differs significantly (think of it as a forum
 website).

 The idea of the appcache feature is to enable offline usage. If you don't
 want it cached, how is it going to work offline?

 It will work offline when the network or the server is down?
 In such case, the latest (cached) main page is shown.

 I wasn't very clear when I say the main page should not be cached.
 I was saying, we should still keep the main page cached,
 but always show the online (non cached) main page if the network and
 the server are alive.


 The main goal here is NOT to make the page offline, but to cache the
 resources that the page uses (i.e, .js, .css, images, etc...) that are
 very likely to be IMMUTABLE (particularly the jQuery.js and jQueryUI
 css+images that almost every sites uses!).

 Appcache only adds one feature: The ability to work offline.

 Everything else that appcache does is already possible with regular HTTP
 caching.

 So if you don't want to work offline, just use regular HTTP caching.


 HTTP Caching requires server modifications on altering the headers and
 is a non option for users that have no control on the server side.
 Also, many servers are mostly mis-configured on how to send the
 correct headers and some proxies may alter them on its way to the
 client.

 It would be great to be able to specify what to CACHE and what not in
 the MANIFEST in the HTML file no matter what HTTP Caching says!

 HTML5 App Cache here works as the complement for web-developers that
 cannot do HTTP Caching.

 Moreover, some HTTP Caching strategies do requires round-trip to the
 servers which can be hundred of milliseconds slower!
 If we specify everything in the manifest file, no such round-trip ever
 necessary.

 In fact, we can do even better than that by not fetching the MANIFEST
 itself by including an (optional) manifest's HASH inside the HTML
 like:

 html useManifest=my.manifest manifestHash=asdfasdfasd

 If not specified, then the my.manifest will always be checked 

Re: [whatwg] AppCache-related e-mails

2011-06-30 Thread Felix Halim
On Fri, Jul 1, 2011 at 12:40 AM, timeless timel...@gmail.com wrote:
 It's possible to build a main page so that it can update its content
 using a subresource. You can use iframes, javascript (including json),
 xmlhttprequests, or other things to do this.

Those are another option besides using localStorage.
Again, those things requires restructuring your website.
I'm looking for a solution that doesn't require modifying anything
except adding a manifest.


 Nothing requires you to have a monolythic main page which is incapable
 of dynamically updating itself. ... If I visit your page on May 1st
 and sit there for two months, does your page really just want to
 continue to show me the same content when I glance at it on July 1st?
 It can show other content if it wants to, and in order to save
 bandwidth costs, it should avoid resending the framework which
 shouldn't be changing. Once your page works well for this case, it
 should work well for app-cache.

As I said before, separating dynamic from the static will work,
however, if we don't have pageStorage, even we have a clean dynamic
separation, it will quickly run out of space if we use localStorage
since the localStorage quota is per domain.

Let's see an example:

I have a dynamic page with this url:

http://bla/page?id=10

The content inside is changing very frequently, lets say every hour.
Of course, I want the browser to cache the latest version.
So, it seemed that AppCache is a perfect fit...

I then add the manifest to enable the App Cache, and what do I get?

Everytime I open that URL every hour, I ALWAYS see the STALE version
(the 1 hour late version). Then few seconds (or minutes) later (depend
on when the AppCache gets updated), I refresh, then I got the latest
content. Annoying, right?

In this case, I better off NOT to use App Cache, since it brings the
old content everytime.

This is why most people says please DON'T cache the main page.



Now, let see the alternative: I build a framework to separate the
dynamic from the static.
I have to make it so that only ONE MAIN PAGE get cached by the app cache.
So, my URL can NO LONGER BE:

http://bla/page?id=10

But it has to change to:

http://bla/page#!id=10

Why do I have to do this? it's because if I DON'T, then each page will
be stored on different App Cache, and the stale by one still occurs!
That is,

http://bla/page?id=10

and

http://bla/page?id=11

Will be on DIFFERENT AppCache!

In that case, my cleanly separated static and dynamic will have no effect!
Because all the statics get duplicated for each App Cache.
It will be the same as if I don't have the framework!

So, to make the AppCache only cache one static framework, I have to
make my page such that it is served under ONE url:

http://bla/page

Then take the #!id=10 as non url (or ajax bookmark). This way, the
AppCache will only cache ONE of my static framework, and MANY dynamic
content inside it.

Guess what? All the incoming links from other blogs are now broken!
Of course I can make a redirect, but redirect is AGAINST making the web faster!

I think Facebook did the #! thing a while ago, then they abandoned it, why?

Ok now I'm happy with my framework and the redirect, and guess what?
Soon, I have other pages with #!id=11, #!id=12, ...,  #!id=1.
All of them are important and I wan't to cache them and I uses the
localStorage (or indexedDB) to cache the dynamic content of those
pages.
Note that even though the dynamic content is dynamic it doesn't mean that:

http://bla/page?id=10

has shared data with

http://bla/page?id=11

It can be totally different unrelated dynamic content. id=10 dynamic
content is entirely different from id=11 dynamic content. However,
since I use localStorage to cache the dynamic content, ALL OF THEM are
limited to the quota of my domain. My 5MB localStorage domain quota
will quickly run out of space.

If only I can store the dynamic content into a pageStorage (assuming
different URL - including the shebang bookmark has different
pageStorage), then I won't be running out of storage if I keep one
page within 5MB. So

http://bla/page#!id=10

has 5 MB pageStorage quota, and

http://bla/page#!id=11

also has 5 MB pageStorage quota, etc...

Then I would be very happy with the new framework.
Since it will store very compact static App and very compact dynamic content.
It's a win win for everyone, nothing is wasted.

But, if I don't have pageStorage quota, my beautifully separated the
dynamic from the static framework will be useless since the
localStorage DOMAIN QUOTA will kill me.


So, we have seen how the AppCache fails to satisfy certain usecase and
how pageStorage is needed to make the alternative solution works.

Here, I propose a solution:  AppCache should COMPLEMENT HTTP Cache so
that the main page is not cached (you know this is not literally
what it means).

With that solution, I don't have to do ANYTHING on my original site to
make it work (except adding a manifest to my original page). I can
still 

Re: [whatwg] AppCache-related e-mails

2011-06-29 Thread Felix Halim
On Thu, Jun 9, 2011 at 3:21 AM, Ian Hickson i...@hixie.ch wrote:
 If you're not loading the main page from the cache, what does this gain
 you that regular HTTP caching doesn't?

Suppose the content of the main page change very often (like news site).
In this case, you don't want to cache the main page since the users
want to see the latest main page, not the cached ones when they open
the main page later.
However, should the network connectivity is down, the user should be
presented with the cached main page.

This problem can be solved by having the main page to NOT include the
news content, but only a static template.
The news content is fetched dynamically through XHR and stored in localStorage.
However, this complicates the news site (a major redesign of the
website is necessary).

It would be far easier if there is an option in the MANIFEST file to
NOT CACHE the main page.
So that the behavior is exactly like caching, but it is far stronger,
since the rest of the resources (css, js, images, etc... are never
re-fetched from the network).
The current HTTP Caching still checks whether the resources are
modified, but in app cache, we can explicitly say that they are not
modified unless we change the manifest hash.

So, in this case, HTML5 App Cache can help make regular online
websites far faster, as well as provide offline access should the
network is down (or the server is down).
This would make the online news site feels online when it's online and
offline when it's offline. I don't think HTTP Cache can serve the
content if the network / server is down.

If the main page is always cached, then the next time the user visits
the main page, it will (almost) always see the STALE content of the
main page.
Then a split second later, the main page refreshes with the most
up-to-date version, which is very annoying to the users.


 On Mon, 14 Feb 2011, Felix Halim wrote:

 I have a use case where it is preferable that the main page is not
 cached:

 Suppose you have a main page that changes based on it's ID:

 http://example.com/page.php?id=10

 The appCache will store each main page with different id in separate
 cache, which is undesirable! And we DON'T want to cache the main pages,
 since the content differs significantly (think of it as a forum
 website).

 The idea of the appcache feature is to enable offline usage. If you don't
 want it cached, how is it going to work offline?

It will work offline when the network or the server is down?
In such case, the latest (cached) main page is shown.

I wasn't very clear when I say the main page should not be cached.
I was saying, we should still keep the main page cached,
but always show the online (non cached) main page if the network and
the server are alive.


 The main goal here is NOT to make the page offline, but to cache the
 resources that the page uses (i.e, .js, .css, images, etc...) that are
 very likely to be IMMUTABLE (particularly the jQuery.js and jQueryUI
 css+images that almost every sites uses!).

 Appcache only adds one feature: The ability to work offline.

 Everything else that appcache does is already possible with regular HTTP
 caching.

 So if you don't want to work offline, just use regular HTTP caching.


HTTP Caching requires server modifications on altering the headers and
is a non option for users that have no control on the server side.
Also, many servers are mostly mis-configured on how to send the
correct headers and some proxies may alter them on its way to the
client.

It would be great to be able to specify what to CACHE and what not in
the MANIFEST in the HTML file no matter what HTTP Caching says!

HTML5 App Cache here works as the complement for web-developers that
cannot do HTTP Caching.

Moreover, some HTTP Caching strategies do requires round-trip to the
servers which can be hundred of milliseconds slower!
If we specify everything in the manifest file, no such round-trip ever
necessary.

In fact, we can do even better than that by not fetching the MANIFEST
itself by including an (optional) manifest's HASH inside the HTML
like:

html useManifest=my.manifest manifestHash=asdfasdfasd

If not specified, then the my.manifest will always be checked for modifications.


 Or i would like to update this file, or any file else, i would like to
 update, on demand.

 Not sure what this means.

I think it means that we should be able to selectively update any file
in the manifest,
rather than blindly updating everything if the manifest's hash changes.

The ability to selectively update the cached files is very appealing.
If your resources are 5 MB, and you know you only want to update on a
small file of 1KB...

I believe the way the current App Cache updates everything if the
manifest file changes is just too inefficient.
You can say it can be no worse than HTTP Caching, but it can be made far better!


 The application cache is very powerful. But it is very disappointing,
 that it is only useful for static pages. With a little 

Re: [whatwg] AppCache-related e-mails

2011-06-16 Thread Michael Nordman
 On Tue, 8 Feb 2011, Michael Nordman wrote:
 
  Just had an offline discussion about this and I think the answer can be
  much simpler than what's been proposed so far.  All we have to do for
  cross-origin HTTPS resources is respect the cache-control no-store
  header.
 
  Let me explain the rationale... first let's back up to the motivation
  for the restrictions on HTTPS. They're there to defeat attacks that
  involve physical access the the client system, so the attacker cannot
  look at the cross-origin HTTS data stored in the appcache on disk. But
  the regular disk cache stores HTTPS data provided the cache-control
  header doesn't say no-store, so excluding this data from appcaching does
  nothing to defeat that attack.
 
  Maybe the spec changes to make are...
 
  1) Examine the cache-control header for all cross-origin resources (not
  just HTTPS), and only allow them if they don't contain the no-store
  directive.
 
  2) Remove the special-case restriction that is currently in place only
  for HTTPS cross-origin resources.

 On Wed, 30 Mar 2011, Michael Nordman wrote:
 
  Fyi: This change has been made in chrome.
  * respect no-store headers for cross-origin resources (only for HTTPS)
  * allow HTTPS cross-origin resources to be listed in manifest hosted on
  HTTPS

 This seems reasonable. Done.



But... I just looked at the current draft of the spec and i think it
reflects a greater change than the one i had proposed.

I had proposed respecting the no-store directive only for cross-origin
resources. The current draft is examining the no-store directive for all
resources without regard for their origin. The intent behind the proposed
change was to allow authors to continue to override the no-store header
for resources in their origin, and to disallow that override only for
cross-origin resources. The proposed change is less likely to break existing
apps, and I think there are valid use cases for the existing behavior where
no-store can be overriden by explicit inclusion in an appcache.


[whatwg] AppCache-related e-mails

2011-06-08 Thread Ian Hickson
On Mon, 31 Jan 2011, Michael Nordman wrote:
 On Mon, Jan 31, 2011 at 4:20 PM, Ian Hickson i...@hixie.ch wrote:
  On Thu, 13 Jan 2011, Michael Nordman wrote:
 
  AppCache feature request: An https manifest should be able to list 
  resources from other https origins.
 
  I've got some app developers asking for this feature. Currently, it's 
  explicitly disallowed by the the spec for valid security reasons, but 
  there are also valid reasons to have this capability, like a webapp 
  that uses resources hosted on gstatic.
 
  Seems like a robots.txt like scheme where a site like gstatic can 
  declare that its OK to appcache me from elsewhere is needed.
 
  I've opened a chromium bug for this here... 
  http://code.google.com/p/chromium/issues/detail?id=69594
 
  Why do the valid security reasons not apply in this case?
 
 The vendors of originA and originB have expressed that its OK for one to 
 appcache resources of the other. In practical terms this is to support a 
 single application being hosted on multiple 'origins'. Google 
 gstatic.com for one example... 
 http://superuser.com/questions/64716/what-is-gstatic-com
 
 If I understand the reason for the restrictions on HTTPS as the 
 following...
 
 The requirement is intended to prevent hostile.example.com from forcing 
 content from checkout.google.com to be stored onto the user's machine, 
 so that a later offline attack involving grabbing the user's laptop 
 cannot retrieve the information.
 
 That doesn't apply in this case because gstatic.com is not hostile to 
 gmail.com.

 [...suggestion to use CORS...]

On Mon, 31 Jan 2011, Jonas Sicking wrote:
 On Mon, Jan 31, 2011 at 2:57 PM, Michael Nordman micha...@google.com 
 wrote:
  I don't �fully understand your emphasis on the implied semantics of a 
  CORS request. You say it *only* means a site can read the response. I 
  don't see that in the draft spec. Cross-origin XHR may have been the 
  big motivation behind CORS, but the mechanisms described in the spec 
  appear agnostic with regard to use cases and the abstract section 
  seems to invite additional use cases.
 
 The spec does say what the meaning of the Access-Contol-Allow-Origin 
 header means. You're trying to modify that meaning.
 
 Consider things from a web authors point of view. The author develops a 
 website, bunnies.com, which contains a HTML page which performs 
 same-site, and thus trusted, XHR requests. The HTML page additionally 
 exposes an API based on postMessage to allow parent frames to 
 communicate with it.

As specced, this isn't possible. Nothing from an appcache is ever run with 
the origin privileges of an origin other than the cache manifest's origin.


 Since the site exposes various useful HTTP APIs it further has adds 
 Access-Control-Allow-Origin: origin Access-Control-Allow-Credentials: 
 true
 
 to a set of the URLs on the site. Including the url of the static HTML 
 page. This is per CORS safe since the HTML page is static there is no 
 information leakage that doesn't happen through a normal 
 server-to-server request anyway.
 
 However, with the modification you are proposing, an attacker site could 
 forever pin this page the users app-cache. This means that if there is a 
 security bug in the page, the attacker site could exploit that security 
 problem forever since any javascript in the page will continue to run in 
 the security context of bunnies.com. So all of a sudden the CORS headers 
 that the site added has now had a severe security impact.
 
 That's why I'm hampering on the semantics.
 
 Another issue is that if a site *is* willing to allow resources to be 
 pinned in the app-cache of another site, it might still not be willing 
 to share the contents of those resources with everyone. If we reuse the 
 existing CORS headers to express is allowed to be app-cache pinned, 
 then we can't satisfy that use case.
 
 For example a website could create a HTML page which embeds a 
 user-specific key and exposes a postMessage based API for third party 
 sites to encrypt/decrypt content using that users key. To allow this to 
 happen for off-line apps it wants to allow the HTML page to be pinned in 
 a third party app-cache. But it doesn't want to expose the actual key to 
 the third party sites. If CORS was used to allow cache-pinning, this 
 wouldn't be possible.

Well this problem doesn't exist for HTML pages, since they wouldn't ever 
run from the appcache, so the above wouldn't work anyway. But your concern 
is valid for, e.g., an image: if we use CORS to allow pinning HTTPS 
resources, there'd be no way to allow an HTTPS resource to be pinned 
without granting read access to that resource as well.


On Tue, 8 Feb 2011, Michael Nordman wrote:
 
 Just had an offline discussion about this and I think the answer can be 
 much simpler than what's been proposed so far.  All we have to do for 
 cross-origin HTTPS resources is respect the cache-control no-store 
 header.
 
 Let me explain the rationale...