from:"duanyao"

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-19 Thread duanyao


在 2017年04月19日 17:28, Anne van Kesteren 写道:

On Wed, Apr 19, 2017 at 11:08 AM, duanyao <duan...@ustc.edu> wrote:

This is really not intended. I just don't quite understand some of those
points. For example,
Is "the web being fundamentally linked to HTTP" just the current status of
the industry, or
the inherent philosiphy of the web? If the latter, some explanation or
document would be very
appreciated.

I suspect it's actually a little higher-level than HTTP, with that
indeed being the current state, but the web is about the exchange of
data between computers and definitely sits at a higher level of
abstraction than the particulars of the Linux or Windows file system.
It's hard to define concretely I think, but being platform-independent
and having data addressable from anywhere are important principles.


It's quite helpful, thanks.
If "addressable from anywhere" is a hard requirement then file: url is 
doomed with the web,
and further discussion would be unnecessary. Though 
platform-independency could be achieved technically.





Doesn't file: protocol also abstract away much of the file system? What
parts make it a bad abstraction?
You mentioned casing and unicode normalization.

File URLs (it's not a protocol really) are still fundamentally tied to
the file system, including how it's hierarchical and such. And then
indeed there's all the legacy implications of file URLs.



I'm not particularly eager to write access myself. Maybe we can seperately
discuss read and write cases.

I already pointed to https://wicg.github.io/entries-api/ as a way to
get access to a directory of files and  as a way to
get access to a sequence of files. Both for read access. I haven't
seen any interest to go beyond that.


Well, I meant accessing local files from local files without user 
actions (e.g. XHR/fetch), mainly used


to load a web app's own assets.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-19 Thread duanyao


在 2017年04月19日 16:09, Anne van Kesteren 写道:

On Wed, Apr 19, 2017 at 5:45 AM, duanyao <duan...@ustc.edu> wrote:

These have been a lot of discussion on that in this thread. Do you think 
writing a more formal document would be helpful?

Perhaps. Fundamentally, I don't think you've made a compelling enough
case for folks to become interested and wanting to work in this space
and help you solve your problem. You've also have been fairly
dismissive of the alternative points of view, such as the web being
fundamentally linked to HTTP and that distributing (offline)
applications over HTTP is the goal. That might make folks less
compelled to engage with you.


I'm sorry to make you feel that I have been dismissive of the 
alternative points of view.
This is really not intended. I just don't quite understand some of those 
points. For example,
Is "the web being fundamentally linked to HTTP" just the current status 
of the industry, or
the inherent philosiphy of the web? If the latter, some explanation or 
document would be very

appreciated.



I suspect no browser, and I'm pretty certain about Mozilla since I
work there, is interested in furthering file URLs.


It is very helpful to hear clear a signals from browser vendors, 
positive or not. Thanks.



Most new operating
systems abstract away the file system and the web as browsers see it
has always done that. There's ways to pull files in, but there's not
much use for letting applications write them out again (other than
downloads, which are quite a bit different).




Doesn't file: protocol also abstract away much of the file system? What 
parts make it a bad abstraction?

You mentioned casing and unicode normalization.

I'm not particularly eager to write access myself. Maybe we can 
seperately discuss read and write cases.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月19日 02:23, Ian Hickson 写道:
The main thing that seems to be missing from this thread is any 
commitment from any browser vendors to actually support any changes in 
this space.


Yes, and I had been pessimistic about that even before I join this thread.

Actually I join the discussion mainly to see whether there are some 
convincing reasons for web standards and browsers to ignore local files.

It is more than welcome if browser vendors would like to comment on this.

Something already mentioned:

* Local files are against the philosophy of the Web.
  Then the problem is what is the philosophy of the Web exactly and why 
-- seems still unclear.


* Accessing local files from local files with JavaScript is insecure.
  Some solutions (including mine) are discussed and I think this is 
solvable. Please comment if anyone think otherwise.


* Accessing local files is not portable.
  I think with some best practices in mind a local web app can be quite 
portable. I'd like to see counterexamples

  if anyone has some.

* A local http server could be an alternative.
  Problems of a local http server have been discussed in detail.

* Electron/NW.js etc. could be alternatives.
  It is overkill to ship a small web app with a large runtime, 
especially when the advanced desktop features are not needed.
  The enoumous man power devoted to Electron/NW.js and similar projects 
is a signal that local web app is relevant.


Something not mentioned here, just my guess:

* Local web app is against the business model of current Internet.
  Please consider users first.

* Cloud is the future, local files will become irrelevant.
  Seems premature, and there are people who feel unconfortable to 
cloudize all personal data and workflow.


I would recommend the following steps for anyone hoping to push 
changes to Web specifications on this topic:


- Approach Web browser vendors privately, to see if they are 
interested in changing their behaviour in this space.


I've no such private link.



- If you find interest, collect up the use cases that you want to 
address, and post them to this list for discussion.


- Collect the input on use cases and try to design a solution that 
fits all the important use cases, then send an an e-mail to this list 
proposing a basic design.


These have been a lot of discussion on that in this thread. Do you think 
writing a more formal document would be helpful?




Cheers,
--
Ian Hickson
--

--
Ian Hickson



Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月18日 19:27, Ashley Sheridan 写道:


On 18 April 2017 12:18:57 BST, duanyao <duan...@ustc.edu> wrote:

在 2017年04月18日 18:52, Ashley Sheridan 写道:

Maybe no. "files" is a generic word, so if you make every

"xxx_files/"

folders magical, it's quite possible that there are folders happen

to

ends with "_files" but are not intented to be local web apps. If you
require a `xxx.html` to make "xxx_files/" magical, it is a little
awkward and confusing for muli-page app.

This is why I propose a new (and unlikely already used) pattern
`xxx_webrun/` for more powerful muli-page app, and limit

`xxx_files/`

to
single page app.

In single page app case, it would be more common that `test.html`

gets

`test_files\page{2|3}.html` via XHR and renders the latter in place,
instead of navigating to it.
So the latter don't need to access `test_files\config.json`

themselves.

*any* magic behavior is a sure-fire sign that something is wrong(TM)

Maybe. But there are occasions where magic is unavoidable. E.g. how to
infer the MIME type of a file? filename extension? magic numbers? all
are magic.

If the barrier is not high enough, name it `xxx__webrun__/`.

But when you're talking about security, which we are, relying on magic anything 
is potentially disastrous.

You mention mime types and file extensions, both of which are not safe to rely 
on for anything related to security, hence there being entire libraries and 
frameworks to attempt to determine and test a files real type (windows still 
fails abysmally in this area though).

Just relying on magic filenames *will* fail. Consider the scenario where a file 
is accidentally copied over the original entry html. Now it's associated with 
the wrong directory of assets and other 'linked' files. This new html entry 
point file could easily be an exploited file, looking to grab whatever data is 
being held locally on your machine.


If a local web app is really critical, it may be digitally signed to 
prevent tampering.
For example, signatures and certifications can be placed in 
`foo_files/META-INF/` or `foo_webrun/META-INF/`(like a signed jar).
A browser can detect change to any file within the web app when loading 
and stops to run.


Signing with a self-signed cert should be enough to detect accident 
damage, and browsers can do this every time it saves a web page.






Thanks,
Ash

Thanks,
Ash

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月18日 19:27, Ashley Sheridan 写道:


On 18 April 2017 12:18:57 BST, duanyao <duan...@ustc.edu> wrote:

在 2017年04月18日 18:52, Ashley Sheridan 写道:

Maybe no. "files" is a generic word, so if you make every

"xxx_files/"

folders magical, it's quite possible that there are folders happen

to

ends with "_files" but are not intented to be local web apps. If you
require a `xxx.html` to make "xxx_files/" magical, it is a little
awkward and confusing for muli-page app.

This is why I propose a new (and unlikely already used) pattern
`xxx_webrun/` for more powerful muli-page app, and limit

`xxx_files/`

to
single page app.

In single page app case, it would be more common that `test.html`

gets

`test_files\page{2|3}.html` via XHR and renders the latter in place,
instead of navigating to it.
So the latter don't need to access `test_files\config.json`

themselves.

*any* magic behavior is a sure-fire sign that something is wrong(TM)

Maybe. But there are occasions where magic is unavoidable. E.g. how to
infer the MIME type of a file? filename extension? magic numbers? all
are magic.

If the barrier is not high enough, name it `xxx__webrun__/`.

But when you're talking about security, which we are, relying on magic anything 
is potentially disastrous.

You mention mime types and file extensions, both of which are not safe to rely 
on for anything related to security, hence there being entire libraries and 
frameworks to attempt to determine and test a files real type (windows still 
fails abysmally in this area though).
Those libraries and frameworks *will* fail because it is entirely 
possible that a file is conformant to multiple formats simultaneously.

Also the methodology use by those libraries and frameworks is magic.



Just relying on magic filenames *will* fail. Consider the scenario where a file 
is accidentally copied over the original entry html. Now it's associated with 
the wrong directory of assets and other 'linked' files. This new html entry 
point file could easily be an exploited file, looking to grab whatever data is 
being held locally on your machine.
Sure it is possible, but usually the damage is limited because the entry 
file can only access a limited folder `XXX_files`. By
accidentally overriding a html file, you already cause a data loss in 
the first place.


Thanks,
Ash

Thanks,
Ash

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月18日 18:52, Ashley Sheridan 写道:



Maybe no. "files" is a generic word, so if you make every "xxx_files/"
folders magical, it's quite possible that there are folders happen to
ends with "_files" but are not intented to be local web apps. If you
require a `xxx.html` to make "xxx_files/" magical, it is a little
awkward and confusing for muli-page app.

This is why I propose a new (and unlikely already used) pattern
`xxx_webrun/` for more powerful muli-page app, and limit `xxx_files/`
to
single page app.

In single page app case, it would be more common that `test.html` gets
`test_files\page{2|3}.html` via XHR and renders the latter in place,
instead of navigating to it.
So the latter don't need to access `test_files\config.json` themselves.

*any* magic behavior is a sure-fire sign that something is wrong(TM)
Maybe. But there are occasions where magic is unavoidable. E.g. how to 
infer the MIME type of a file? filename extension? magic numbers? all 
are magic.


If the barrier is not high enough, name it `xxx__webrun__/`.




Thanks,
Ash

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月18日 16:08, Anne van Kesteren 写道:

On Tue, Apr 18, 2017 at 9:57 AM, Roger Hågensen  wrote:

Searching Google for "offline webapp discussion group" turns up
https://www.w3.org/wiki/Offline_web_applications_workshop
and that's sadly from 2011.

There is https://www.w3.org/TR/offline-webapps/

Right, those are about making applications distributed over HTTPS work
when the user is not connected. That idea doesn't necessitate file
URLs and we're still working towards that ideal with Fetch, HTML, and
Service Workers. All browsers seem on board with that general idea
too, which is great.
Offline webapp is great, but I'd say that offline webapp is "an online 
web app that can work offline temporarily",
not really a local web app. If the entity operating an offline webapp 
goes out of service permanently, the webapp
will stop to work soon. This is one of the reasones why local web app is 
still relevant.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-18 Thread duanyao


在 2017年04月18日 16:09, Roger Hågensen 写道:

On 2017-04-17 15:22, duanyao wrote:

This can handle multipage fine as well.
Anything in the folder test.html_files is considered sandboxed under
test.html

The problem is, what if users open `test_files\page2.html`or
`test_files\page3.html`directly? Can they access 
`test_files\config.json`?

This is to be solve by the "muli-page application" convention. By the
way, the name of the directory is usually `foo_files`, not
`foo.html_files`.


Good point. But why would a user do that when the entry point is the 
test.html?
The user may bookmark it and access it later on; the tab maybe restored 
from a previous browser session;

the user may open it from the histroy list, and so on.



In this case the browser could just fallback to default behavior for 
local html files.

Agree.


Alternatively the browser could have some logic that knows that this 
is a page under the test folder which is the sandbox for test.html


Also your example of "test_files\page3.html" and 
"test_files\config.json" ofcourse page3.html could access it, just 
like it could access config.js if not for CORS on XHR and local files.
Maybe no. "files" is a generic word, so if you make every "xxx_files/" 
folders magical, it's quite possible that there are folders happen to 
ends with "_files" but are not intented to be local web apps. If you 
require a `xxx.html` to make "xxx_files/" magical, it is a little 
awkward and confusing for muli-page app.


This is why I propose a new (and unlikely already used) pattern 
`xxx_webrun/` for more powerful muli-page app, and limit `xxx_files/` to 
single page app.


In single page app case, it would be more common that `test.html` gets 
`test_files\page{2|3}.html` via XHR and renders the latter in place, 
instead of navigating to it.

So the latter don't need to access `test_files\config.json` themselves.



Actually a lot of the issue here is XHR (and fetch) not being possible 
for local web pages.


The only reason I suggested using the same naming convention for the 
sandbox folder is that (at least on Windows) Explorer deletes both the 
html and folder something users are familiar with. Though I'm sure 
Microsoft could add support for the same to another folder naming 
convention, I can't see that being backported to Windows 8.1/8/7.
`xxx_webrun/` convention doesn't need OSes' support, just browsers'; and 
you just delete that folder to delete the app completely.





I just checked what naming Chrome does and it uses the page title. I
can't recall what the other browsers do. And adds _files to it.

Chrome can be configured to ask for location when saving a page, then
you can name it as you will.
The "xxx_files" convention was introduced by IE or Netscape long ago,
and other browsers just follow it.
...

I have not tested how editing/adding to this folder affect things,
deleting the html file also deletes the folder (at least on Windows
10, and I seem to recall on Windows 7 as well).

There is no magic link between `foo.html` and `foo_files/`, this is just
a trick of Windows Explorer. You can change things by hand in that
directory as you will.


I just confirmed that. just creating a empty .html file and a same 
named folder with _Files at the end does "link" them in Explorer.
Is this unique to Windows or does other platforms do the 
same/something similar?


Probably just Windows Explorer. At least Nautilus file manager on linux 
doesn't do the trick.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月18日 00:03, Anne van Kesteren 写道:

On Mon, Apr 17, 2017 at 5:53 PM, duanyao <duan...@ustc.edu> wrote:

When we want to write a web application portable across multiple server
OSes, these issues could happen too.

Yes, but then you run into implementation bugs. Which are a very
different category from proprietary OS design decisions.
I'm not sure the meaning of "implementation bugs" -- the bug of the web 
application or the server OSes?


It seems you imply that "OS design decisions" are arbitray or unstable 
over time, which is not qutie true.
As to filesystems' semantics, all major OSes are very stable in last 
decades and unlikely to diverge dramatically
in the next decade. Apple's HFS+ normalizes unicodes, but the newer APFS 
doesn't,

which is converging to other OSes.




I think "portable" is never absolute.

Sure, but at least that's the goal for those participating in the
non-proprietary web ecosystem.
I think you overstate the proprietariness of filesystems' semantics. 
Developers and users make
use of local html files (in cross-platform manner) for decades and 
generally feel positive. Please

Don't ignore this.





There are always incompatibilities
between browsers, and even once standardized feature can be
deprecated/removed in future, e.g. `window.showModalDialog()`,
`` and ``.

This happens rarely and when it happens it's a very considered
decision involving lots of people. It's usually related to complexity,
lack of use, and security.


Sure. Proprietary OSes don't change thier core API in incompatibe way 
for no good reason, too.


I don't expect a local web app tested on major OSes today would stop to
work tomorrow due to a filesystem API change.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月17日 21:39, Anne van Kesteren 写道:

On Mon, Apr 17, 2017 at 3:32 PM, duanyao <duan...@ustc.edu> wrote:

So you mean file: protocol is not portable? For absolute file: url, true;
for relative url, almost not true.

When writing web pages, no one use absolute file: urls in practice, so this
is a non-issue.

Neither is portable or part of the web, since you don't allocate
resources on someone else their machine that way. (And even in the
sense that you mean it, they're not portable due to the different
styles of matching, case-insensitive, Unicode normalization, custom
variants of Unicode normalization, bytes vs code points, etc.)


When we want to write a web application portable across multiple server 
OSes, these issues could


happen too. The rules of thumb are (1) assume case-sensitive but don't 
create file names which differ


only in casing. (2) avoid characters subject to unicode normalization in 
file names.



I think "portable" is never absolute. There are always incompatibilities 
between browsers, and even


once standardized feature can be deprecated/removed in future, e.g. 
`window.showModalDialog()`,


`` and ``.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月17日 21:04, Anne van Kesteren 写道:

On Mon, Apr 17, 2017 at 2:54 PM, duanyao <duan...@ustc.edu> wrote:

在 2017年04月15日 02:09, Domenic Denicola 写道:

file: URLs are part of the web, e.g. parsing such URLs when used in 
tags, just like gopher: URLs or mailto: URLs. The behavior once navigating
to file: URLs (or gopher: URLs, or mailto: URLs) is off the web, and outside
the scope of the WHATWG's work.

This still doesn't explain why file: protocol CAN'T be part of the web (and
inside the the scope of WHATWG).

Because it's a mechanism for addressing resources on a specific OS.
It's not a mechanism for addressing resources on the web.


So you mean file: protocol is not portable? For absolute file: url, 
true; for relative url, almost not true.


When writing web pages, no one use absolute file: urls in practice, so 
this is a non-issue.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月17日 20:43, Roger Hågensen 写道:

On 2017-04-17 13:53, duanyao wrote:

For single page application, browsers restrict `foo.html`'s permission
to `foo_files/` in the same parent directory. Note that it is already 
a common practice for browsers
to save a page's resource to a `xxx_files/` directory; browsers just 
need to grant the permission

of `xxx_files/`.


I like that idea. But there is no need to treat single and multipage 
differently is there?



d:\documents\test.html
d:\documents\test.html_files\page2.html
d:\documents\test.html_files\page3.html

This can handle multipage fine as well.
Anything in the folder test.html_files is considered sandboxed under 
test.html
The problem is, what if users open `test_files\page2.html`or 
`test_files\page3.html`directly? Can they access `test_files\config.json`?
This is to be solve by the "muli-page application" convention. By the 
way, the name of the directory is usually `foo_files`, not `foo.html_files`.




This would allow a user (for a soundboard) to drop audio files into
d:\documents\test.html_files\sounds\jingle\
d:\documents\test.html_files\sounds\loops\
and so on.

And if writing ability is added to javasript then write permission 
could be given to those folders (so audio files could be created and 
stored without "downloading" them each time)


I just checked what naming Chrome does and it uses the page title. I 
can't recall what the other browsers do. And adds _files to it.
Chrome can be configured to ask for location when saving a page, then 
you can name it as you will.
The "xxx_files" convention was introduced by IE or Netscape long ago, 
and other browsers just follow it.




So granting read/write/listing permissions for the html file to that 
folder and it's subfolders would certainly make single page offline 
apps possible.
Yeah, I think it is unlike harmful to allow write/listing permission as 
well.


I have not tested how editing/adding to this folder affect things, 
deleting the html file also deletes the folder (at least on Windows 
10, and I seem to recall on Windows 7 as well).
There is no magic link between `foo.html` and `foo_files/`, this is just 
a trick of Windows Explorer. You can change things by hand in that 
directory as you will.


I'm not sure if a offline app needs the folder linked to the html file 
or not.
A web developer might create the folder manually in which case there 
will be no link. And if zipped and moved to a different 
system/downloaded by users then any such html and folder linking will 
be lost as well.


Maybe instead of d:\documents\test.html_files\
d:\documents\test.html_data\ could be used?
This would also distinguish it from the current user saved webpages.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月15日 02:09, Domenic Denicola 写道:

From: David Kendal [mailto:m...@dpk.io]


This is getting silly. 
says the WHAT WG's purpose is to 'evolve the Web'; since file: URIs are part
of the web, this problem falls within the WHAT WG's remit.

file: URLs are part of the web, e.g. parsing such URLs when used in  tags, 
just like gopher: URLs or mailto: URLs. The behavior once navigating to file: URLs 
(or gopher: URLs, or mailto: URLs) is off the web, and outside the scope of the 
WHATWG's work.
This still doesn't explain why file: protocol CAN'T be part of the web 
(and inside the the scope of WHATWG).
No one is asking for web over gopher or ftp because http is a better 
alternative; No one is asking for web over
mailto: because it is not a protocol for transporting data. But many 
pepople are asking for web over file:
protocol because (1) file: protocol shares a lot of charaters with http, 
which makes them believe that web can
work reasonably well over it -- with some effort. (2) http can't cover 
some use cases of file: protocol, and they

believe these use cases are important.

The argument that http: is  for "open" or "world wide" contents and 
file: is for "walled gardens" is rather weak.
So many softwares on linux ship manuals in html format, and they are 
open and world wide. People can also
distribute html files via ed2k or bittorrent, and they are open and 
world wide. In contrast, iCloud, Google Drive,
and OneDrive are private by default, although http and web technologies 
are used.





If you continue with this argument, I will simply ignore you. I am more
interested in debating how to solve the problem than quibbling over who
should solve it.

Please do so. I'm just stating the WHATWG's position on this for the clarity of 
other participants of this list; I would certainly prefer that you do not 
engage further in attempting to redefine the WHATWG's scope.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-17 Thread duanyao


在 2017年04月16日 01:54, David Kendal 写道:

On 15 Apr 2017, at 14:07, Roger Hågensen  wrote:


Patrick makes a good point.

For example asking a user if it' sok for the HTML document to access
stuff in "C:\Users\Username\AppData\Local\Temp\" what do you think most
uses will do?
Just click OK, after all "they" have nothing important in that folder,
their stuff is in "Documents" instead.

This is why I added the restriction that pages can only request access
to directories that are parents of the directory they're in.

Maybe this is not enough.

The directories which users would save web pages to usually also contain 
large amount of personal

data. E.g. "C:/Users//Documents|Downloads" on Windows and
"/home//Documents|Downloads" on linux. Temp directory is also 
sensitive.


Asking permission for a sensitive directory is not ideal: users either 
lose functionality of the saved page,

or risk losing privacy.


I admit I
don't actually know much about how Windows lays out files these days --
if the 'Downloads' folder is within some other folder that also contains
a load of private stuff. If so, or if that's so on some other popular
OS, maybe I'm wrong.

Browsers could also add restrictions that you can't request access to
the root directory or top-level subdirectory of an OS volume, or what-
ever else is needed for appropriate security on a particular OS.
It is impratical to blacklist all sensitive directories, because many 
users use customized data

directories, e.g. "D:/work" or "D:/MyData".



Some participants on the Chrome bug thread suggested that Chrome could
look for some hidden file that would give files in the directory under
it XHR/Fetch access to that directory. That seems similar to what you
suggest, but I dislike the idea of a hidden file doing this unbeknownst
to users -- and even if it were visible, its function may not be obvious.

The major problem of this solution is that users may be tricked to 
download such configuration file


to a sensitive directory, and open a hole permanently.


Here is my solution: restrict local file access to certain directory 
naming partteners.



The use cases of local html files can be divided into two types: single 
page application and


multi-page application.


For single page application, browsers restrict `foo.html`'s permission 
to `foo_files/` in the same


parent directory. Note that it is already a common practice for browsers 
to save a page's resource


to a `xxx_files/` directory; browsers just need to grant the permission 
of `xxx_files/`.



For multi-page application, browsers requires that its "application root 
directory" ends with `_webrun`


(or other sensible name). All files within an `xxx_webrun/` are treated 
as same-origin, but they


can't access files outside of the `xxx_webrun/`.


There is no need to ask users for permission to `xxx_files/` or 
`xxx_webrun/`directories. For html


files without such directories, access to local files may not be allowed.


It is much less likely that users would unintenionally or be tricked to 
put files into an existing


`xxx_files/` or `xxx_webrun/`directory, so the security risk is 
minimized. Browsers can even enforce it:


warn users when try to save a file into an existing `xxx_files/` or 
`xxx_webrun/`directory.



Regards,

Duan, Yao.

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-12 Thread duanyao


在 2017年04月11日 20:04, Patrick Dark 写道:

Jan Tosovsky 於 4/10/2017 5:38 PM 寫道:

On 2017-04-10 David Kendal wrote:

On 2017-04-09 Jan Tosovsky wrote:

On 2017-04-09 David Kendal wrote:


... there are many possible uses for local static files accessing
other local static files: the one I have in mind is shipping static
files on CD-ROM or USB stick...

In this case the file structure is fixed so it can be exported as
JSON file and then linked via the HTML header in every HTML file where
it is needed. This structure is then directly available for the 
further

processing.

However, I am not sure this covers your use case.

I'm not sure either, because I don't understand what you're proposing.
What feature of the HTML header enables this functionality? (For an
arbitrary number of files which may be wanted by an arbitrary number
of other files, and which could be very large.)
Imagine e.g. WebHelp composed of collection of static files with 
Table of Contents (ToC) in the left pane. It is not very efficient to 
generate large ToC into every single HTML file. If you extract ToC 
into a dedicated HTML page, it cannot be imported by standard means 
directly into another HTML page (analogically to XML Inclusions [1]). 
You have to use either IFrame, or, better, provide ToC as JSON file. 
JSON is kind of javascript which can be linked via the

Re: [whatwg] Accessing local files with JavaScript portably and securely

2017-04-11 Thread duanyao

We should be aware of the security risks when recommand a "simple web
server".

* Most (if not all) simple web servers don't block access from non-local
hosts by default,
which can leak users' files. Although your firewall can block them
for you, users do need unblock
non-local hosts sometimes (e.g. test with a smart phone), so some may
have whitelisted the

server anyway.

* Even if non-local hosts are blocked, non-current users'(in the same
OS) access can't be blocked
easily by a web server. In contrast, file:// access is subject to
file permission check.

* Most (if not all) simple web servers are hobby projects so probabaly
lacks enough security audit.
E.g. How urls like "/foo/../../../bar" are handled to prevent
escaping from the root directory?

Those risks may be non-issue for experienced developers, but do affect
newbie developers
and normal users. So In my opinion, it is much better to improve and
standardize file-url

handling in browsers.

Regards,

Duan, Yao

在 2017年04月10日 04:33, Gregg Tavares 写道:

I know this doesn't address your CD-ROM/USB stick situation but FYI...

for the dev situation there are many *SUPER* simple web servers

https://greggman.github.io/servez/

https://github.com/cortesi/devd/

https://github.com/indexzero/http-server/

https://docs.python.org/2/library/simplehttpserver.html (not recommended,
haven't tried the python 3 one)

https://chrome.google.com/webstore/detail/web-server-for-chrome/ofhbbkphhbklhfoeikjpcbhemlocgigb?hl=en
(soon to be deprecated)

more here
http://stackoverflow.com/questions/12905426/what-is-a-faster-alternative-to-pythons-http-server-or-simplehttpserver

On Mon, Apr 10, 2017 at 4:36 AM, Jan Tosovsky
wrote:

On 2017-04-09 David Kendal wrote:

... there are many possible uses for local static files accessing
other local static files: the one I have in mind is shipping static
files on CD-ROM or USB stick...

In this case the file structure is fixed so it can be exported as JSON
file and then linked via the HTML header in every HTML file where it is
needed. This structure is then directly available for the further
processing.

However, I am not sure this covers your use case.

Jan

Re: [whatwg] EventSource and data URLs

2015-04-27 Thread duanyao


在 2015年04月27日 22:58, Jonas Sicking 写道:

On Mon, Apr 27, 2015 at 2:20 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

On Mon, Apr 27, 2015 at 7:00 AM, Anne van Kesteren ann...@annevk.nl wrote:

Currently Chrome supports data URLs inside EventSource whereas in
Firefox EventSource is restricted to http/https URLs:

   https://bugzilla.mozilla.org/show_bug.cgi?id=1156137

What's the convergence we want here?

It's rather frustrating when data: urls don't work in various places;
they're an invaluable debugging tool, at minimum.  They should
generally be treated as the same security level as the page, no?

There's definitely exceptions to this. For example chrome doesn't run
a iframe src=data:... with the same origin as its parent. For IMHO
good reasons since it's a potential XSS vector if a website accepts
URLs from third parties and render them inside a child iframe.

The same problem exists with accepting data: URLs in new Worker(...).


I think this is unfortunate.

In iframe, srcdoc attribute seems as secure (insecure) as data: URL in 
src, so should it be removed from the spec?


Restriction of data: URL to iframe.src can also be workarounded by 
creating an iframe with src=about:blank, and then manipulate  its DOM 
as your wish.


On Web Worker, according to current spec 
(http://dev.w3.org/html5/workers/#dedicated-workers-and-the-worker-interface

), data: URL and same-origin blob: URL are allowed as worker URL:

Firefox accepts data: URL as worker URL, and I remeber that older 
versions of Chrome also did.


So should the Worker spec be changed to disallow data:/blob: URL? This 
change would make it hard or impossible to ship a web app/library that 
uses workers in one file.


Regards,
  Duan Yao

Re: [whatwg] EventSource and data URLs

2015-04-27 Thread duanyao


在 2015年04月28日 02:42, Jonas Sicking 写道:

On Mon, Apr 27, 2015 at 7:37 PM, duanyao duan...@ustc.edu wrote:

In iframe, srcdoc attribute seems as secure (insecure) as data: URL in src,
so should it be removed from the spec?

The difference there, and in the other examples that you mention, is
that you know that you are loading content in your own domain. The
problem with data: URLs is that the same API sometimes does a network
load, and sometimes parses content and runs in your security origin.

I understand now, thanks.
However, normal URLs from third parties to be rendered inside iframes 
are not necessarily from different origins (e.g. blog or forum sites), 
so the attack
is still possible unless those sites explictly sandbox all iframes, or 
disallow iframes at all.




I'm happy to have a way to opt-in to enable loading data: in iframes
and Workers. But I strongly prefer an explicit opt-in.

Note that the chrome team apparently currently feels that data: in
iframe is so unsafe that they always load it in a sandbox. And never
allow data: in Workers. There's no way to even opt in to having it
behave any other way.


How about blob: URL in Workers? Current chrome seems allowing it. I 
think it is less likely for Workers to run third party URLs as iframes  
-- after all, worker URLs must have same origin in the first place.




/ Jonas

Re: [whatwg] Modify the Page Visibility spec to let UA's take into account whether iframes are visible on the screen

2015-03-31 Thread duanyao

autopause looks promising, but I want to ask for more: also add an
autounload attribute to allow UAs to unload specific iframes when they
are invisible.

I ask for this because I'm a contributor of pdf2htmlEX (
https://github.com/coolwanglu/pdf2htmlEX ).
Currently pdf2htmlEX can convert each PDF page to one SVG image and
embed it in the main HTML via embeds （almost equivalent to iframe
here).
However, if the PDF file contains many pages, the memory consumption of
embeds becomes unacceptable, although most of them are out of the
viewport.
Because UAs are not allowed to automatically unload invisible nested
browsing context, we have to do this in JS (by removing/adding embeds
from/to the tree). This is complicated and doesn't work if JS is
disabled. If autounload were supported, things would be much simpler.

Also I think in some use cases of autopause, autounload is even more
suitable becaue it not only save CPU and network, but also save memory,

which are equally important to mobile devices.

在 2015年03月31日 14:18, Roger Hågensen 写道:

Looking at https://developer.mozilla.org/en/docs/Web/HTML/Element/iframe

Wouldn't maybe the addition of new attribute to the iframe be the best
way?

**
autopause
If present the client can pause any processing related to the
iframe if the iframe is not currently visible. When unpaused a Page
Visibility event will be sent to the iframe as if the whole page had
changed status from invisible to visible.
For visibility events see
https://developer.mozilla.org/en-US/docs/Web/Guide/User_experience/Using_the_Page_Visibility_API

This basically makes it opt in, it changes nothing about the behavior
of current iframes.
Just an example would be a iframe that can be hidden/unhidden by the
user clicking a button, and if the iframe has the autopause attribute
then it's state is effectively paused. Once the iframe is unpaued a
Page Visibility event is sent and whatever code is running in the
frame can then react to this and and resume, as it never got a event
indicating the page was made non-visible a programmer should be able
to programmatically infer that the iframe was unpaused (it only got
the one event instead of two).

What type of iframes would benefit from this? Some chats, news feeds,
log views, anything that constantly updates or works in the
background, but does not need to be updated if not viewable (saves
CPU, bandwidth, and server resources)

And maybe down the road one could see if a similar autopause can be
added to the parent page itself (or not). Maybe a autopause would
make sense if added as a attribute to the body (but actually apply to
the whole document including any scripts declared in the head).

Adding a autopause attribute to a iframe is probably the easiest way
to add/deal with this.
If nobody ends up using it, it can easily be dropped again too, so
there is no immediate downside to this that I can currently think of
at least.

On 2015-03-31 02:17, Seth Fowler wrote:
I do want to clarify one other thing: we’re definitely not yet at the
point of implementing this in Gecko, especially not in a release
version. We think this functionality is important, and modifying the
Page Visibility spec is one way to make it accessible to the web.
It’s probably even the nicest way to make it accessible to the web,
if it’s feasible. But it’s not certain that it’s web compatible or
that everyone agrees this is the best way to go; we’re at the
starting point of the process here.

I’d be interested to hear any comments that others may have!

Thanks,
- Seth

On Mar 30, 2015, at 3:47 PM, Seth Fowler s...@mozilla.com wrote:

I think we should modify the Page Visibility spec to let UA’s take
actual visibility of iframes into account when deciding if an iframe
is hidden.

This design doesn’t do much for iframes which may be doing
significant work, though. The most obvious example is HTML5 ads.
These ads may be performing significant work - computation, network
IO, rendering, etc. Some or all of that work is often unnecessary
when the ad is outside the viewport. Having an API that would allow
those ads to throttle back their work when they’re not visible could
have significant positive effects on performance and battery life.

We could get these benefits through a very simple modification of
the Page Visibility spec. We should make the visibility of iframes
independent of the top-level browsing context, and instead let UA’s
take the actual visibility of the iframes into account. If an iframe
has been scrolled out of the viewport, has become occluded, or has
otherwise been rendered non-visible, we should regard the iframe as
hidden and dispatch a visibilitychange event to let the iframe
throttle itself.

...
- Seth

Re: [whatwg] Memory management problem of video elements

2014-08-20 Thread duanyao


于 2014年08月20日 15:52, Philip Jägenstedt 写道:

On Tue, Aug 19, 2014 at 3:54 PM, duanyao duan...@ustc.edu wrote:

于 2014年08月19日 20:23, Philip Jägenstedt 写道:


On Tue, Aug 19, 2014 at 11:56 AM, duanyao duan...@ustc.edu wrote:

If the media element object keeps track of its current playing url and
current position (this requires little memory), and the media file is
seekable, then
the media is always resumable. UA can drop any other associated memory of
the media element, and users will not notice any difference except a
small
delay
when they resume playing.

That small delay is a problem, at least when it comes to audio
elements used for sound effects. For video elements, there's the
additional problem that getting back to the same state will require
decoding video from the previous keyframe, which could take several
seconds of CPU time.

Of course, anything is better than crashing, but tearing down a media
pipeline and recreating it in the exact same state is quite difficult,
which is probably why nobody has tried it, AFAIK.

UA can pre-create the media pipeline according to some hints, e.g. the video
element is becoming visible,
so that the delay may be minimized.

There is a load() method on media element, can it be extended to instruct
the UA to recreate
the media pipeline? Thus script can reduce the delay if it knows the media
is about to be played.

load() resets all state and starts resource selection anew, so without
a way of detecting when a media element has destroyed its media
pipeline to save memory, calling load() can in the worst case increase
the time until play.
I meant we could add an optional parameter to load() to support soft 
reload, e.g. load(boolean soft),

which doesn't reset states and re-select resource.

Maybe it is better to reuse pause() method to request UA to recreate the 
media pipeline. If a media element is in
memory-saving state, it must be in paused state as well, so invoke 
pause() should not have undesired side effects.


Anyway, it seems the spec needs to introduce a new state of media 
element: memory-saving state.
In low memory condition, UA can select some low-priority media elements 
and turn them into memory-saving state.


Suggested priorities for videos are:
(1) recently (re)started, playing, and visible videos
(2) previously (re)started, playing, and visible videos
(3) paused and visible videos; playing and invisible videos
(4) paused and invisible videos

Priorities for audios are to be considered.

Memory-saving state implies paused state.

If memory becomes sufficient, or a media elements priorities are about 
to change, UA can restore some of them to
normal paused state (previously playing media doesn't automatically 
resume playback).


If pause() method is invoked on a media element in memory-saving state, 
UA must restore it to normal paused state.



Audios usually eat much less memory, so UAs may have a different strategy
for them.

Many native media players can save playing position on exit, and resume the
playing from that position on the next run.
Most users are satisfied with such feature. Is recovering to exact same
state important to some web applications?

I don't know what is required for site compat, but ideally destroying
and recreating a pipeline should get you back to the exact same
currentTime and continue playback at the correct video frame and audio
sample. It could be done.


I'm not familiar with game programing. Are sound effects small audio files
that are usually
played as a whole? Then it should be safe to recreate the pipeline.

There's also a trick called audio sprites where you put all sound
effects into a single file with some silence in between and then seek
to the appropriate offset.
I think if UA can get and set currentTime property accurately, it should 
be able to recreate the pipeline

with the same accuracy. What are the main factors limiting the accuracy?
However, a UA using priorities to manage media memory is unlikely to 
reclaim a in-use audio sprites element's memory.


Philip

Re: [whatwg] Memory management problem of video elements

2014-08-20 Thread duanyao


于 2014年08月20日 19:26, Philip Jägenstedt 写道:

On Wed, Aug 20, 2014 at 12:04 PM, duanyao duan...@ustc.edu wrote:

于 2014年08月20日 15:52, Philip Jägenstedt 写道:


On Tue, Aug 19, 2014 at 3:54 PM, duanyao duan...@ustc.edu wrote:

I'm not familiar with game programing. Are sound effects small audio
files
that are usually
played as a whole? Then it should be safe to recreate the pipeline.

There's also a trick called audio sprites where you put all sound
effects into a single file with some silence in between and then seek
to the appropriate offset.

I think if UA can get and set currentTime property accurately, it should be
able to recreate the pipeline
with the same accuracy. What are the main factors limiting the accuracy?

I don't know, but would guess that not all media frameworks can seek
to an exact audio sample but only to the beginning of a video frame or
an audio frame, in which case currentTime would be slightly off. One
could just lie about currentTime until playback continues, though.
Such limitation also affects seeking, not only memory-saving feature, 
and the spec allows quality-of-implementation issue, so I think this 
is acceptable.
Additionally, a media in memory-saving state must be paused, I think 
users won't care about the small error of resuming position.




Philip

[whatwg] Memory management problem of video elements

2014-08-19 Thread duanyao

Hi,

Recently I have investigated memory usage of HTML video element in
several desktop browsers (firefox and chrome on windows and linux, and
IE 11), and have found some disappointing results:

1. A video element in a playable state consumes significant amount of
memory. For each playing or paused or preload=auto video element, the
memory usage
is up to 30~80MB; for those with preload=metadata, memory usage is
6~13MB; for those with preload=none, memory usage is not notable. Above
numbers are measured with 720p to 1080p H.264 videos, and videos in
lower resolutions use less memory.

2. For a page having multiple video elements, memory usage is scaled up
linearly. So a page with tens of videos can exhaust the memory space of
a 32bit browser. In my tests, such a page may crash the browser or
freeze a low memory system.

3. Even if a video element is or becomes invisible, either by being out
of viewport, having display:none style, or being removed from the active
DOM tree (but not released),
almost same amount of memory is still occupied.

4. The methods to reduce memory occupied by video elements requires
script, and the element must be modified. For example, remove and
release the element.

Although this looks like a implementors' problem, not a spec's problem,
but I think the current spec is encouraging implementors to push the
responsibility of memory management of media elements to authors, which
is very bad. See the section 4.8.14.18
(http://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content.html#best-practices-for-authors-using-media-elements):

4.8.14.18 Best practices for authors using media elements
it is a good practice to release resources held by media elements when
they are done playing, either by being very careful about removing all
references to the element and allowing it to be garbage collected, or,
even better, by removing the element's src attribute and any source
element descendants, and invoking the element's load() method.

Why this is BAD in my opinion?

1. It requires script. What if the UA doesn't support or disables script
(email reader, epub reader, etc), or the script is simply failed to
download? What if users insert many video elements to a page hosted by a
site that is not aware of this problem (so no video management script
available)? Users' browsers may be crashed, or systems may be freezed,
with no obvious reason.

2. It is hard to make the script correct. Authors can't simply depend on
done playing, because users may simply pause a video in the middle and
start playing another one, and then resume the first one. So authors
have to determine which video is out of viewport, and remove its src,
and record its currentTime; when it comes back to viewport, set src and
seek to previous currentTime. This is quite complicated. For WYSIWYG
html editors based on browsers, this is even more complicated because of
the interaction with undo manager.

3. Browsers are at a much better position to make memory management
correct. Browsers should be able to save most of the memory of an
invisible video by only keep its state (or with a current frame), and
limit the total amount of memory used by media elements.

So I think the spec should remove section 4.8.14.1, and instead stresses
the the responsibility of UA to memory management of media elements.

Regards,
Duan Yao.

Re: [whatwg] Memory management problem of video elements

2014-08-19 Thread duanyao

于 2014年08月19日 16:00, Philip Jägenstedt 写道:

On Tue, Aug 19, 2014 at 9:12 AM, duanyao duan...@ustc.edu wrote:

Hi,

Recently I have investigated memory usage of HTML video element in
several desktop browsers (firefox and chrome on windows and linux, and
IE 11), and have found some disappointing results:

4. The methods to reduce memory occupied by video elements requires
script, and the element must be modified. For example, remove and
release the element.

4.8.14.18 Best practices for authors using media elements
it is a good practice to release resources held by media elements when

they are done playing, either by being very careful about removing all
references to the element and allowing it to be garbage collected, or,
even better, by removing the element's src attribute and any source
element descendants, and invoking the element's load() method.

Why this is BAD in my opinion?

So I think the spec should remove section 4.8.14.1, and instead stresses
the the responsibility of UA to memory management of media elements.

What concrete advice should the spec give to UAs on memory management?
If a script creates a thousand media elements and seeks those to a
thousand different offsets, what is a browser to do? It looks like a
game preparing a lot of sound effects with the expectation that they
will be ready to go, so which ones should be thrown out?

UA can limit the number of simultaneously playing medias according to
available memory or user preference,
and fire error events on media elements if the limit is hit. We may need
another error code, currently some UAs fire MEDIA_ERR_DECODE,

which is misleading.

If the thousand media elements are just sought, not playing, UA can seek
them one by one, and drop cached frames afterwards, only keep current
frames;

if memory is even more limited, the current frames can also be dropped.

For a html based slideshows or textbooks, it is quite possible to have
tens of videos in one html file.

For audio elements, I think it is less problematic because they usually
use far less memory than videos.

A media element in an active document never gets into a state where it
could never start playing again, so I don't know what to do other than
trying to use less memory per media element.

What do you mean by a state where it could never start playing again?
If the media element object keeps track of its current playing url and
current position (this requires little memory), and the media file is
seekable, then
the media is always resumable. UA can drop any other associated memory

Re: [whatwg] Memory management problem of video elements

2014-08-19 Thread duanyao

于 2014年08月19日 20:23, Philip Jägenstedt 写道:

On Tue, Aug 19, 2014 at 11:56 AM, duanyao duan...@ustc.edu wrote:

于 2014年08月19日 16:00, Philip Jägenstedt 写道:

On Tue, Aug 19, 2014 at 9:12 AM, duanyao duan...@ustc.edu wrote:

Hi,

Recently I have investigated memory usage of HTML video element in
several desktop browsers (firefox and chrome on windows and linux, and
IE 11), and have found some disappointing results:

4. The methods to reduce memory occupied by video elements requires
script, and the element must be modified. For example, remove and
release the element.

(http://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content.html#best-practices-for-authors-using-media-elements):

4.8.14.18 Best practices for authors using media elements
it is a good practice to release resources held by media elements when

Why this is BAD in my opinion?

So I think the spec should remove section 4.8.14.1, and instead stresses
the the responsibility of UA to memory management of media elements.

Opera 12.16 using Presto had such a limit to avoid address space
exhaustion on 32-bit machines, limiting the number of concurrent media
pipelines to 200. However, when the limit was reached it just acted as
if the network was stalling while waiting for an existing pipeline to
be destroyed.

It wasn't a great model, but if multiple browsers (want to) impose
limits like this, maybe a way for script to tell the difference would
be useful.

I think it is even better for UA to play the media element that the
user/script tried to play most recently, and drop pipelines for those
are paused and/or invisible.

P.S. I forgot to say that UAs that fire MEDIA_ERR_DECODE event for
not-enough-memory error also show error message decode error

on the UI of video elements, which confuse users too.

If the thousand media

Re: [whatwg] How to determine content-type of file: protocol

2014-07-31 Thread duanyao


于 2014年07月31日 02:02, Anne van Kesteren 写道:

On Tue, Jul 29, 2014 at 4:26 PM, 段垚 duan...@ustc.edu wrote:

于 2014/7/29 18:48, Anne van Kesteren 写道:

There's an enormous amount of tricky things to define around file
URLs, this being one of them.

Are there some resources on those tricky things?

No, not really. But it's a short list:

1) Parsing
2) Mapping a parsed file URL to an OS-specific filesystem
(case-sensitivity, case folding, ...)
3) Turning the resource into something that looks like a HTTP response

1 is for the URL Standard and would ideally be agnostic of OS. 2 and 3
would be for the Fetch Standard, if we were to define the details. I'm
hoping to get 1 done at least.
I feel that case handling is somewhat out-of-scope, because it is 
OS-dependent, and even http urls may break

when migrating between OSs with different case sensitiveness.
What are the tricky parts of 3? I'm aware of content-type and status code.

I agree that file protocol is less important than http. However packaged web
applications (PhoneGap app, Chrome app, Firefox OS app, Window 8 HTML app,
etc) are increasing their popularity, and they are using file: protocol or
similar things to access their local assets. So I think it's worthwhile to
work on file
protocol to reduce porting issues of packaged web applications.

Well, or similar is important. Because those things are not really
similar at all but instead something that's actually portable across
systems and something we can reasonably standardize.
I don't think url schemes used by packaged web apps are much more 
portable than file: for now.
Actually, they usually have very similar behaviors with file: on 
corresponding browsers.
For example, Firefox OS app use app: scheme, and XHR treat any file as 
XML; Chrome app
use chrome-extension: scheme, and XHR deduce mime type from file 
extension, while Content-Type

header is missing.

Also some of these schemes are designed to be private and may not be 
standardize.
In contrast, file: scheme has been standardized to some extend. If we 
could fully standardize file:
first, schemes like app: and chrome-extension: would probably mimic its 
behaviors.

Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread duanyao


On 07/28/2014 06:34, Gordon P. Hemsley wrote:
Sorry for the delay in responding. Your message fell through the 
cracks in my e-mail filters.


On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):


5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type

provided by the file system.

As far as I know, no main-stream file systems record MIME type for 
files. Does the spec actually want to say provided by the operating 
system or

provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the 
spec, originally pointed out to me by Boris Zbarsky via IRC many 
months ago. The intent here is basically just whatever the computer 
says it is—whether that be via the file system, the operating system, 
or whatever, and whether it uses magic bytes, file extensions, or 
whatever.


In other words, feel free to read that as the correct behavior is 
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the 
popularity of packaged web applications, including PhoneGap app, Chrome 
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file: 
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's 
mime type, porting of packaged web applications can be problematic in 
some situations (actually my team already hit this).


I know that currently there is no standard way to determine a local 
file's mime type, this may be one of the reason that mimesniff spec has 
not defined a behavior here.


I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in 
web standards, determine a local file's supplied-type according to its 
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg, 
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by 
any means.


I think this rule should resolve most of the interoperability problems, 
and largely maintain compatibility with current UAs' implementations.


My second question is: does above rule apply equally to both fetching 
static resources (top level, iframe, img, etc) and XMLHttpRequest?


It seems all browsers try to figure out actual type for local static 
resources, so that .htm and .xhtml files are rendered as HTML and 
XHTML respectively,

so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local 
files of any type; and if setting xhr.responseType = 'document', 
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching 
behavior.


Chromium(34) set Content-Type header to null for local files of any 
type; but if setting xhr.responseType = 'document', response is 
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting 
xhr.responseType = 'blob', blob.type is the file's actual type, i.e. 
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however 
Content-Type header is missing.


I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' 
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some 
existing codes that already rely on firefox's behavior

(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to 
make any judgements or claims until I hear his view on the situation.


That being said, I created the Contexts wiki article [1] and began 
splitting up the mimesniff spec according to contexts [2] in an effort 
to clarify this situation and make sure that all bases were covered. 
It's still a work in progress, awaiting feedback from implementers and 
other spec writers.


I agree that there's a hole in how mimesniff, XHR, and Contexts 
intersect, and I'll be happy to update mimesniff to fill it, if that's 
determined to be the best course of action.


HTH,
Gordon

[1] http://wiki.whatwg.org/wiki/Contexts
[2] http://mimesniff.spec.whatwg.org/#context-specific-sniffing

I note that in the Contexts wiki article, connection context (which 
XHR belongs to) has no sniffing algorithm specified.
Does this mean UA should not sniff in case of XHR, or just mean the 
algorithm has not been specified yet?
Personally I'd like to have connection context use same algorithm as 
browsing context, because client js codes aren't always

sure about the mime types sent via XHR, much like

Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread duanyao


On 07/28/2014 22:08, Gordon P. Hemsley wrote:

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say provided by the operating
system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just whatever the computer
says it is—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as the correct behavior is
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.


Well, the most basic reason is because I never delved into how it 
actually works, because I was primarily concerned with HTTP connections.


It's possible that there is no interoperable way to determine a local 
file's MIME type, but see below.



I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.


There is already a standard in place to detect file types on the 
operating system level:


http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that 
would work? (That specification has complex rules for detecting files, 
including magic bytes and whatnot, and is already used on a number of 
Linux distros and probably other operating systems.)



Maybe no.
(1) it's a standard of *nix desktops, I doubt MS widows will adopt it, 
and maybe it's a bit heavy for mobile OS;
(2) many packaged web apps are ported from (and share codes with) normal 
web apps, and most web servers simply deduce mime type from file extension,

so doing the same thing in UAs probably results in better compatibility.
(3) UAs are already required to do mime type sniffing, which should be 
enough to correct most wrong supplied-type.

My second question is: does above rule apply equally to both fetching
static resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static
resources, so that .htm and .xhtml files are rendered as HTML and
XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local
files of any type; and if setting xhr.responseType = 'document',
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always
'application/xml'. This is significantly diverse from static fetching
behavior.

Chromium(34) set Content-Type header to null for local files of any
type; but if setting xhr.responseType = 'document', response is
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting
xhr.responseType = 'blob', blob.type is the file's actual type, i.e.
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however
Content-Type header is missing.

I think rule 5.1 should be applied to both static fetching and XHR
consistently. Browsers should set Content-Type header to local files'
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some
existing codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org

[whatwg] How to determine content-type of file: protocol

2014-07-17 Thread duanyao

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):

   5.1 Interpreting the resource metadata
   ...
   If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type
   provided by the file system. 

As far as I know, no main-stream file systems record MIME type for files. Does 
the spec actually want to say provided by the operating system or 
provided by the file name extension?

My second question is: does above rule apply equally to both fetching static 
resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static resources, 
so that .htm and .xhtml files are rendered as HTML and XHTML respectively, 
so far so good.

But when it comes to XHR, things are different. 

Firefox(31) set Content-Type header to 'application/xml' for local files of any 
type; and if setting xhr.responseType = 'document', response is parsed as XML; 
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching behavior.

Chromium(34) set Content-Type header to null for local files of any type; but 
if setting xhr.responseType = 'document', response is parsed according to its 
actual type, 
i.e. .htm as HTML and .xhtml as XHTML; and if setting xhr.responseType = 
'blob', blob.type is the file's actual type, i.e. 'text/html' for .htm and 
'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however Content-Type 
header is missing.

I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' actual 
type for XHR, and interpret
them accordingly. But firefox developers think this would break some existing 
codes that already rely on firefox's behavior 
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762). 

What do you think?

Regards,
Duan Yao.

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] Accessing local files with JavaScript portably and securely

Re: [whatwg] EventSource and data URLs

Re: [whatwg] EventSource and data URLs

Re: [whatwg] Modify the Page Visibility spec to let UA's take into account whether iframes are visible on the screen

Re: [whatwg] Memory management problem of video elements

Re: [whatwg] Memory management problem of video elements

[whatwg] Memory management problem of video elements

Re: [whatwg] Memory management problem of video elements

Re: [whatwg] Memory management problem of video elements

Re: [whatwg] How to determine content-type of file: protocol

Re: [whatwg] How to determine content-type of file: protocol

Re: [whatwg] How to determine content-type of file: protocol

[whatwg] How to determine content-type of file: protocol

28 matches

Site Navigation

Mail list logo

Footer information