Re: [whatwg] The iframe element and sandboxing ideas

2008-07-26 Thread Frode Børli
 Frode Børli wrote:
 Yeah, I thought about that also. Then we have more complex attributes
 such as style='font-family: expression#40;a+5#41;;'... So your
 sanitizer must also parse CSS properly - including unescaping
 entities.
 The way HTML Purifier handles this is unescaping all entities (hex, dec
 and named) before handling HTML. Output text is always in UTF-8 and thus
 never has entities.

The sanitizer seems very good. I see that your purifier does not allow
: in urls (which is an important part of for example Wikipedia urls) -
but still it makes it difficult to use javascript: style links.

Anyway: how many hours have you spent developing the sanitizer? The
discussion was not wether it could be done server side or not. Imagine
fetching content from another site using a client side javascript
(which it seems HTML 5 will allow). Should the HTML Purifier be
implemented in pure javascript as well then - or must the content
still do a round trip to the server for sanitasion?

 A bank want a HTML-messaging system where the customer can write
 HTML-based messages to customer support trough the online banking
 system. Customer support personell have access to perform transactions
 worth millions of dollars trough the intranet web interface (where
 they also receive HTML-based messages from customers).

 A few problems with this theoretical situation:
 1. Why does the bank need an HTML messaging system?

Because the bank wants to be percieved as innovative by its customers?
It is not my place to question WHY somebody need a feature. Why is
there a manufactorer logo on most cars? It isnt strictly required...

 2. Why is this system on the same domain as the intranet web interface?

Content is submitted from the banks public website - but customer
support handles the mails in the internal webmail system which may be
on the same domain..

 3. Why do customer support personell have access to the transaction
 interface?

Better question: is it good that since html-sanitizing cannot be done
securely we need more employees?

If I contact my account manager he most likely have access to perform
tasks on my account, as well as on other customers bank accounts.

 Security depends on on a perfect sanitizer. Would you sell your
 sanitizer to this bank without any disclaimers, and say that your
 sanitizer will be valid for eternity and for all browsers that the
 bank decides to use internally in the future?
 Well, it's an open-source sanitizer. But that aside, say, I was selling
 them a support contract, I would not say valid for eternity. However,

Then we need client side sandboxing.

 Today I would not allow HTML-based messages since I could never be
 sure enough that the sanitizer was perfect.
 I encourage you to try out HTML Purifier http://htmlpurifier.org. It's
 certainly not perfect (we've had a total of two security problems with
 the core code (three if you count a Shift_JIS related vulnerability, and
 four if you count an XSS vulnerability in a testing script for the
 library)), but I hope it certainly approaches it.

I really appreciate it and will possibly use it depending on its
license. It is a problem (for me) that it cannot use : in its urls.

PS: Note  that PHP is not perfect, and if you rely on PHP-functions
for unescaping etc then a future version of PHP might introduce new
bugs. I know from experience...


-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] The iframe element and sandboxing ideas

2008-07-26 Thread Frode Børli
Yes, lets all go back to Word Perfect for DOS and hinder innovation.

Besides, this is not the proper arena for this discussion:)

2008/7/26 Kristof Zelechovski [EMAIL PROTECTED]:
 A bank sporting a site with a form encouraging the customer to enter
 arbitrary HTML code would be perceived innovative indeed, albeit in the
 Monty-Pythonic sense.  I can envision the logo: The First Alternative
 Reality Bank.  Hopefully, all its accounts would be run in lindendollars...
 And no wonder it could afford only one employee.
 Chris

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Frode Borli
 Sent: Saturday, July 26, 2008 9:40 AM
 To: Edward Z. Yang
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [whatwg] The iframe element and sandboxing ideas

 Frode Borli wrote:
 A bank want a HTML-messaging system where the customer can write
 HTML-based messages to customer support trough the online banking
 system. Customer support personell have access to perform transactions
 worth millions of dollars trough the intranet web interface (where
 they also receive HTML-based messages from customers).

 A few problems with this theoretical situation:
 1. Why does the bank need an HTML messaging system?

 Because the bank wants to be percieved as innovative by its customers?
 It is not my place to question WHY somebody need a feature. Why is
 there a manufactorer logo on most cars? It isnt strictly required...

 2. Why is this system on the same domain as the intranet web interface?

 Content is submitted from the banks public website - but customer
 support handles the mails in the internal webmail system which may be
 on the same domain..

 3. Why do customer support personell have access to the transaction
 interface?

 Better question: is it good that since html-sanitizing cannot be done
 securely we need more employees?

 If I contact my account manager he most likely have access to perform
 tasks on my account, as well as on other customers bank accounts.

 Security depends on on a perfect sanitizer. Would you sell your
 sanitizer to this bank without any disclaimers, and say that your
 sanitizer will be valid for eternity and for all browsers that the
 bank decides to use internally in the future?
 Well, it's an open-source sanitizer. But that aside, say, I was selling
 them a support contract, I would not say valid for eternity. However,

 Then we need client side sandboxing.







-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] The iframe element and sandboxing ideas

2008-07-25 Thread Frode Børli
 Frode Børli wrote:
 td colspan='javascript(a + 5)'/td

 Where a javascript returns the value in the colspan attribute. Many
 server side HTML sanitizers would have to be updated - unless we
 introduce a proper sandbox.

 Or the HTML sanitizer could have done things properly and checked if
 colspan was a numeric value. :-)

Yeah, I thought about that also. Then we have more complex attributes
such as style='font-family: expression#40;a+5#41;;'... So your
sanitizer must also parse CSS properly - including unescaping
entities. Notice that if you check for entities in the form of #40;
remember that Internet Explorer does not require the semi-colon in the
end. #40 is translated by Internet Explorer but not by the PHP
entity_decode function.

For all I know - a future invention may introduce a new method of
encoding entities also, so your sanitizer must support all future
entity encodings.

Ofcourse we can skip supporting the style attribute - but there are
not many other ways to style content in XHTML.

 Disclaimer: I am one of those authors of server side HTML sanitizers you
 speak of.

Theoretically speaking:

A bank want a HTML-messaging system where the customer can write
HTML-based messages to customer support trough the online banking
system. Customer support personell have access to perform transactions
worth millions of dollars trough the intranet web interface (where
they also receive HTML-based messages from customers).

Security depends on on a perfect sanitizer. Would you sell your
sanitizer to this bank without any disclaimers, and say that your
sanitizer will be valid for eternity and for all browsers that the
bank decides to use internally in the future?

Today I would not allow HTML-based messages since I could never be
sure enough that the sanitizer was perfect.


[whatwg] WebSockets: Should we decide on protocol before deciding on features?

2008-07-25 Thread Frode Børli
I think we should agree on which features that WebSockets need to
provide before deciding on a protocol or method of achieving the
goals.

Basically I want these features from WebSockets:

1. The server side script that generated the page can at any later
time raise any event on the client side.
2. The client side script can at any time raise any event on the
server side (meaning inside the script that initially generated the
page).
3. It must work trough existing Internet infrastructure, including
strict firewalls and proxies.
4. It should also be possible to open extra websockets to other
scripts - possibly trough the XMLHttpRequest object.

The above requirements implies that any standard URL can deliver a web
socket (since the web socket piggybacks with the original page
response).

Today requirement 1 can partly be achieved by never completing the
transfer of the HTML-document: whenever the server wants to raise an
event on the client side it appends
scripteventhandler(data)/script and still does not close the
transfer. Main problem here is that the onload event is never fired...

Number 2 can be achieved by using XMLHttpRequest - but this starts
another script instance on the server for each event raised from the
client, and that is inefficient.

Since the communication is directly to the script that generated the
page, we by default have an URL scheme and all virtual hosting
problems are solved. Also there is no need for cookies or other
authentication schemes to be sure which page the server is talking to.

(imagine the customer having two separate browser windows, both trying
to talk to the same chat server - it brings in a lot of problems that
are not solved by simple cookies).

Number 3 requires that we use port 80 and 443, or that all restrictive
firewalls in the world is reconfigured to open extra ports as well.

Number 4 is implied since 1 and 2 requires the websocket to piggyback
the original response. An AJAX request is basically a page request to
an URL - so a websocket should be able to piggyback it. Also security
restrictions defined for XHR should apply for WebSockets.


I do not care how this is achieved - but as a busy programmer I do not
want to spend hours researching how to implement this. Everything I
need should be provided by HTML 5 compliant browsers and the
webserver/cgi (e.g. Apache/PHP) interface.


Re: [whatwg] The iframe element and sandboxing ideas

2008-07-23 Thread Frode Børli
I am not sure - the sandbox should not allow any scripts at all, that is my
only requirement. More advanced requirements can be taken care of server
side.

The issue I want sandbox for is that it allows us to introduce other ways to
embed scripts in tags in the future. Imagine this becoming legal in HTML 6
for some reason:

td colspan='javascript(a + 5)'/td

Where a javascript returns the value in the colspan attribute. Many server
side HTML sanitizers would have to be updated - unless we introduce a proper
sandbox.

Of course a white list could be nice - but sending a list of 50+ tags for
each item in a guestbook is a bit much. CSS syntax could be used for such a
whitelist; a[href],span[style],area[alt|href] etc. With no whitelist -
everything should be allowed, except scripts.

Frode

2008/7/23 James Ide [EMAIL PROTECTED]:

  On Tue, Jul 22, 2008 at 3:22 PM, Frode Børli [EMAIL PROTECTED] wrote:

 The server must escape all user generated content by replacing  with
 lt; etc. This is perfectly secure for all existing browsers. The
 sandbox instructs the browser to unescape. Completely fail safe for
 all.




Re: [whatwg] The iframe element and sandboxing ideas

2008-07-22 Thread Frode Børli
The server must escape all user generated content by replacing  with
lt; etc. This is perfectly secure for all existing browsers. The
sandbox instructs the browser to unescape. Completely fail safe for
all.

On 7/22/08, James Ide [EMAIL PROTECTED] wrote:
 I'm not sure that I follow - it seems to me that searching for unescaped
 text and failing is not a reliable solution. As you mention:

 The problem is 1: that the user can easily write /span in his comment
 and
 bypass the sandbox and 2: it is not backward compatible.


  Say I input /span and the application developer has forgotten to
 sanitize user input or permits use of the span tag (and has done some poor
 checking for well-formed code). The application may later display a page
 with my input, thus containing (e.g.): span sandbox=1 /span /span,
 where /span in the middle is my input.

 Is this a span element with unescaped content (/span), or is it
 malformed HTML? In my eyes, it's the latter and if any UA were to treat it
 this way, it would be trivial to inject more harmful code. On a side note,
 would comments be permitted inside a sandbox? Developers may wish to have
 this functionality, but there is also the concern of a malicious user
 submitting the string !--, which, from some brief testing, appears to
 cause problems in IE6 and possibly more UAs. I do not have significant
 experience with parsers so I can't say for sure if these issues are
 showstoppers but they raise some concerns.

 If the browser finds unescaped content inside a sandbox it should refuse to
 display the page - thereby forcing the author to fix this immediately.


 As mentioned previously on the topic of sandboxes, such a strict failing
 policy may not be desirable. Perhaps a more gentle approach is only not to
 render the sandbox's contents and perhaps display an error message in its
 stead.

 Overall, I'm seeing sandbox elements to be weak safety nets. AFAIK, there is
 no way for a UA alone to perfectly determine what is author- or
 developer-generated and what is user-submitted; user input must go through
 some santizing process to be completely safe.

 - James


-- 
Sent from Gmail for mobile | mobile.google.com

Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] Web Sockets

2008-07-21 Thread Frode Børli
I have some feedback based on the discussions i participated in
earlier. Since I am on vacation I cannot give a proper proposal but I
think the following should be considered:


1. Allow pure TCPSocket using this method: var s = new
TCPSocket(/tcpsocket.xml);

The tcpsocket.xml-file must have a structure similar to this:

websocket
  hosthostname/ip-address/host
  portportnumber/port
  allow-origin*/allow-origin
/websocket

Clarifications:

host: if specified and if the host is another host than the host where
tcpsocket.xml was downloaded from - a secure algorithm should be
applied, for example using reverse dns lookups on the target
IP-address and inspecting the TXT-records of the host name that the
reverse lookup returned.

port: any port

allow-origin: simple method of limiting who can connect to the port
specified in the tcpsocket.xml-file. For example this could be the
complete url of the javascript file or it could contain wildcards.


Advantages:

- Easy to adopt today on existing servers and can easily utilize for
example existing IRC-servers etc without modifications.
- Enables cross site usage (script on www.example.com can connect to
Yahoo by downloading www.yahoo.com/websocket.xml)
- Requires access to place files on the targeted server - so it is not
possible by simple cross site scripting attacks.
- A simple perl script can dynamically generate the xml-file above.
- Allows connection to SMTP servers only if the server owner intends
to allow it.


2. WebSockets should use previous work from RFC 2817
(http://www.ietf.org/rfc/rfc2817.txt). Web servers such as Apache must
then be extended to support websockets, but it should be very easy for
a developer to start using websockets. It would not require an extra
application listening on a separate port, and it would by definition
work in a virtual hosting environment.

Since the request is to an ordinary URL, the webserver will direct the
request to a file or script in the web root for the virtual host and
this script can decide to send an 426 Upgrade Required response, or it
can send 401 Unauthorised if the client sent the wrong Origin headers.


[whatwg] The iframe element and sandboxing ideas

2008-07-21 Thread Frode Børli
I like the proposal of adding a seamless attribute to the iframe element,
though it should perhaps be added using CSS since it applies to styling?

I also want the following:

span sandbox=1 /span

This is because a typical Web 2.0 usage is to have a list of comments with a
thumbs up/thumbs down for each message. This requires more fine grained
control of what is user generated content and what is scripted content.

The problem is 1: that the user can easily write /span in his comment and
bypass the sandbox and 2: it is not backward compatible.

This is prevented by requiring anything inside a sandbox being entity
escaped:

span sandbox=1 lt;/spangt; /span

If the browser finds unescaped content inside a sandbox it should refuse to
display the page - thereby forcing the author to fix this immediately.

Any comments?


Re: [whatwg] TCPConnection feedback

2008-06-24 Thread Frode Børli
 It is worth spending months improving the implementation here, if it
 saves only one minute of work for each of the millions of web
 developers out there, in the future.

 Alright, point taken. You're of course absolutely right with that :)
 I agree, it would be very convenient to basically set up and control a
 web app in a single connection. However, I think there are valid use
 cases for just the opposite set up as well. So, if we use a HTTP
 handshake, we should provide two ways.

Agree

 If it is impossible to use HTML, then it is absolutely required that
 Session ID is added as a standard header - since that will be the only
 way of linking the generated HTML with the separate persistent
 connection. You can't assume that an application server or the web
 server will be able to parse the cookie, since the cookie format is
 different for each programming language/application.

 This depends on the layer where the session management takes place.
 For example, PHP's existing session handling system already uses
 cookies. So, a hypothetical future PPHP version of PHP could extend

The PHP *script* decides how to encode session identifiers, not the
PHP-engine. PHP has a default cookie variable called PHPSESSID that
many php-scripts use but many PHP-applications implement their own
session handling. If the Session ID was implemented trough headers, it
would have the following benefits:

1. Many more links in the chain between the browser and the server
side script can utilize session id; for example a load balancer will
easily see which internal server to pass requests to.
2. Session ID is not available for client side scripts - and makes
session hijacking
much more difficult. (Today they are accessible trough document.cookie)
3. Server side logic can be implemented in many more layers for
connecting requests to an actual user session.
4. Statistics tools can much more easily identify unique visitors, as
Session ID potentially could be logged in log files.
I am sure there are more advantages.

 the session system to support this.
 This feature couldn't be implemented in the afromentioned few lines
 of perl though.

SessionID header only need to be implemented on the client side for
HTML5 browsers. Server side scripting languages can immediately read
headers and set headers - but it would be an advantage if PHP (and
others) was updated to use SessionID-header as default for request
from browsers supporting HTML5.

The HTTP 101 Switching Protocols can be sent by the server, without
the client asking for a protocol change. The only requirement is that
the server sends 426 Upgrade Required first, then specifies which
protocol to switch to. The protocol switched to could possibly be the
one proposed in the beginning of this thread.

 I don't see how we could use 426 as a notification that the client
 should open a WebSocket connection. 426 is still an error code, so if
 you send it as the reply to the initial GET request, you can't be sure
 the HTML file you pushed gets interpreted the correct way. While this
 would probably work, it would be semantically unclean at best.

I am not a HTTP protocol wizard, but I have read that something
similar is done for starting HTTPS-communications, and I believe the
same procedure can be used for WebSocket.

 PROPOSAL: Turning an existing HTTP connection into a WebSocket connection:

 If the server sends a Connection: Upgrade header and an Upgrade header
 with a WebSocket token as part of a normal response and if the
 resource fetched established a browsing contest, the client must not
 issue any other requests on that connection and must initiate a
 protocol switch.
 After the switch has finished, the client would expose the connection
 to the application via a DefaultWebSocket property or something
 similar.

 An exchange could look like this:

 C: GET /uri HTTP/1.1
 C: Host: example.com
 C: [ ... usual headers ... ]
 C:

 S: HTTP/1.1 200 OK
 S: Content-Type: text/html
 S: [ ... usual headers ... ]
 S: Upgrade: WebSocket/1.0
 S: Connection: Upgrade
 S:
 S: [ ... body ... ]

 C: OPTIONS /uri HTTP/1.1
 C: Host: example.com
 C: Upgrade: WebSocket/1.0
 C: Connection: Upgrade
 C:

 S: HTTP/1.1 101 Switching Protocols
 S: Upgrade: WebSocket/1.0
 S: Connection: Upgrade
 S:

 C/S: [ ... application specific data ... ]

 Because the connection would be same-origin pretty much per
 definition, no access checks would be needed in that situation.

 Would something like this be doable and wanted?

This is exactly what I was trying to describe :)

 Consider the following scenario:

 Bob and Eve have bought space on a run-of-the-mill XAMPP web hoster.
 They have different domains but happen to be on the same IP. Now Eve
 wants do bruteforce Bob's password-protected web application. So she
 adds a script to her relatively popular site that does the following:

 So Bob will DDOS his own server? And my proposals allows using
 hostnames OR ip-addresses in the DNS TXT record, so unless Eve 

Re: [whatwg] What should the value attribute be for multi-fileupload controls in WF2?

2008-06-24 Thread Frode Børli
 Because it breaks the common interface that the value property returns a 
 scalar?

Doesn't renaming the .value property to for example .files also break
the common interface?

Frode


Re: [whatwg] Proposal for cross domain security framework

2008-06-23 Thread Frode Børli
 Actually, DNS servers, particularly for reverse DNS lookups, are out of the
 control of a huge number of authors on the web. Shared hosting accounts for
 instance don't have a unique reverse IP look up. There are also plenty of


The reverse DNS spec specifically allows one IP address to have
multiple reverse domains.


 people who don't control their DNS at all for whatever reason.


1. People that do not have control over the reverse lookup seldom have
control over multiple servers and seldom require to distribute load
like this.

2. The script should be allowed to connect to its origin server (as
unsigned Java applets are allowed to, today).

3. Hosting providers will add tools allowing their customers to
configure this security framework, if it is required - but again; if
you are on a shared server you most likely will not need to connect to
multiple servers. It will also usually suffice to have a proxy on the
server (like many people do for XMLHttpRequests now).


Re: [whatwg] Proposal for cross domain security framework

2008-06-23 Thread Frode Børli
Hi! Thank you for pointing to that document. I quickly scanned trough
it but I have a small problem with the specification: does it require
web servers to check the Origin header? What happens with older web
applications that do not check this header?

Frode


2008/6/23 Anne van Kesteren [EMAIL PROTECTED]:
 On Mon, 23 Jun 2008 09:34:27 +0200, Frode Børli [EMAIL PROTECTED] wrote:

 [...]

 I'd suggest looking into the work the W3C has been doing on this for the
 past two years:

  http://dev.w3.org/2006/webapi/XMLHttpRequest-2/
  http://dev.w3.org/2006/waf/access-control/


 --
 Anne van Kesteren
 http://annevankesteren.nl/
 http://www.opera.com/




-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] TCPConnection feedback

2008-06-20 Thread Frode Børli
, regardless of protocol - if the Reverse DNS
suggestion (above) is used for security - right?

-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


[whatwg] Proposal for cross domain security framework

2008-06-20 Thread Frode Børli
I have a proposal for a cross domain security framework that i think
should be implemented in browsers, java applets, flash applets and
more.

The problem:
If browsers could connect freely to whichever IP-address they want,
then a simple ad on a highly popular website can be used to trigger
massive DDOS attacks or distributed brute force password attacks etc.

The challenge:
The owner of the server that receives incoming connections must be
able to decide who is able to connect.

The tools available:
The browser. The server. DNS servers.

The method:
The browser always know where it downloaded any given script or
applet. It also know which IP-address or host-name the script wants to
connect to. The browser should perform the following check to make
sure that the given script is allowed to connect:

1. Browser downloads a script from server A.
2. Script tries to connect to server B.
3. Browser looks up server B's IP-address.
4. Browser performs a reverse lookup of server B's IP-address and gets
a host name for the server.
5. Browser looks up a special TXT record in the DNS record for Server
B, which states each of the IP addresses/host names that can hosts
scripts allowed to connect.

DNS records are cached multiple places (including at the local
computer), so a DDOS attack attempting to take down DNS servers
probably not succeed.


What do you think?


Best regards,
Frode Børli
Seria AS, Norway


Re: [whatwg] Proposal for cross domain security framework

2008-06-20 Thread Frode Børli
 1. Browser downloads a script from server A.
 2. Script tries to connect to server B.
 3. Browser looks up server B's IP-address.
 4. Browser performs a reverse lookup of server B's IP-address and gets
 a host name for the server.
 5. Browser looks up a special TXT record in the DNS record for Server
 B, which states each of the IP addresses/host names that can hosts
 scripts allowed to connect.

 DNS records are cached multiple places (including at the local
 computer), so a DDOS attack attempting to take down DNS servers
 probably not succeed.
 DNS-Server-Information is often not accessible for many hosts/shared hosts.

This is no problem, since the script that creates the connection can
be hosted on any host and included from any host. Server A includes
script from Server B. If the script from Server B creates a connection
to Server B, then Server A's page can communicate with Server B.

Secondly, there are a lot of hosts that allow you to edit DNS records
- and the rules of a free market will ensure that those who doesn't
will follow shortly.

 Adobe has some of the same Problems with the Adobe-Flash-Player.
 They use a crossdomain.xml-file to provide policy-informations.

That is ofcourse another solution. I like the DNS solution too as it
would be more scalable, since the server that would be under attack
would not have to serve millions of hits to the crossdomain.xml file.

But still, couldn't we combine the two methods? I have read the Adobe
article and it gave me another idea based on policy xml files:

If the socket is created like this: var socket = new WebSocket(host,
port); then DNS is checked.


If the socket is created like this: var socket = new
WebSocket(http://www.example.com/chatserver.xsocket;);

Then the .xsocket file is an XML file specifying exactly how the
WebSocket should connect to the server and perhaps any restrictions on
the connections? It would be similar to including a script from a
server, but this would have the following benefits:

1. The chatserver.xsocket-file could be dynamically generated,
allowing many things that we may not think about today.
2. It would be suitable for shared servers.
3. It would allow for example Yahoo! to create services that you can
connect to simply by doing new
WebSocket(http://www.yahoo.com/services/search.xsocket;)
4. To load balance, the url could redirect the connecting user to
another xsocket file on another server perhaps?

Frode


Re: [whatwg] Proposal for cross domain security framework

2008-06-20 Thread Frode Børli
 Web applications could still easily ported from one system to the
 other, because the file would be processed transparently.

 The only problem I see is getting the allowed domains right, the
 xsocket file can point to. On the one hand, you may want a dedicated
 machine for the persistent connections if you run a very popular
 service and anticipate many connections at once. On the other hand,
 you don't want an evil site getting access to your service using their
 own xsocket file.

All servers that can accept connections must have a xsocket file.

The only way around this that I see is my reverse DNS proposal...

-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] What should the value attribute be for multi-file upload controls in WF2?

2008-06-20 Thread Frode Børli
Why can't .value returnere an array? just the first filename seems
silly. I would expect an array.

On 6/20/08, Thomas Broyer [EMAIL PROTECTED] wrote:
 On Fri, Jun 20, 2008 at 3:44 PM, Lachlan Hunt wrote:

 Windows browsers:
 IE 8: test.txt
 IE 7 mode:test.txt

 For the record, real IE7 (on Vista) says the full path.

 --
 Thomas Broyer



-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] Suggestion of an alternative TCPConnection implementation

2008-06-19 Thread Frode Børli
 Correct me if I am wrong: no two-way TCP daemon like telnet, ssh, POP3, NNTP
 or IMAP allows reconnecting to an existing session when the connection drops
 and for UDP daemons this question is moot because the connection never drops
 although it can occasionally fail.  Why should a custom connection from
 inside the browser make an exception?

One of the suggestions is creating a special protocol for
bi-directional communications with the server. The other suggestion is
creating a pure TCPConnection.

The pure TCPConnection should not specify a protocol, and it should be
able to connect to any port, for example telnet, SSH, POP3 etc.

My suggestion should replace only the special protocol variant and be
based on http/xmlhttprequest like this:

1. Client connects to web server
2. Web server and client communicates and establishes communication
trough the same socket pairs as the web page itself was transmitted
trough.
3. The client can then continue communicating with the server trough a
singleton object document.webSocket.

What happens in the background is the following:

Server starts a new thread or forks a new process to handle the
communication trough the channel. The server use the Session ID header
to match the client with the correct process on the server.

If the client is disconnected, then the next communication
reestablishes the connection using the session id header to match the
client with the correct thread on the server. This will be transparent
to both the server side script and the client side script - and I
believe it will work trough existing proxy servers.


The same type of communication channel can then be established by
scripting, for example like this:

var socket = new WebSocket(http://url/path/to/script.php;);


This has the following benefits:

1. No security problems to worry about: Server decides if the request
should be handled as a persisting socket . If the server side script
is not persistent - then the request will be handled as a normal http
request by the server and the WebSocket object should cast an
exception.
2. Use existing protocols (HTTP), in a backward compatible way.
3. Since it uses the existing communication channel, it will always
work trough firewalls.
4. It supports virtual hosting by default.
5. It supports full URLs, including the path by default.


The disadvantage is that to utilize the feature, web servers must be
updated to support WebSockets - but I do not think that's different
from requiring special servers anyway.

Frode


Re: [whatwg] Implementation of a good HTTPSocket (TCP-socket)

2008-06-19 Thread Frode Børli
 Web pages should only be allowed to access other servers when the
 script has been digitally signed, and when the user has agreed to
 giving the script elevated privileges - or there should be a
 certificate on the origin server which is checked against DNS records
 for each server that the script attempts to connect to.

I have changed my view on this. Only a DNS record is required:

1. A script by default can only connect to the server that it was
downloaded from.
2. Server A has a script that tries to connect to Server B. Server B
must have a record in its DNS that allows scripts originating from
Server A.

Nothing more should be needed, a DDOS attack using javascript can
never succed unless the attacker controls the DNS servers. I think
this DNS method could be used for all cross site scripting security
policies (java applets, flash etc). Additionally, the client can have
policies disallowing reconnects in for example one minute, if the
server responds with HTTP 401 Access Denied.

 So what we want is a http based protocol which allow the client to
 continue communicating with the script that handles the initial
 request.
 I absolutely agree that this would be the best way. However, couldn't
 we use Michaels proposal for that? It seems to solve the same problems
 and is actually compliant HTTP (in theory at least).

We should have both a pure TCPConnection, and a ServerConnection
object. It could possibly be based on the BEEP protocol, i dont know
that protocol. All I know is that HTTP has mechanisms for switching
protocols (HTTP 101 Switching Protocols), and is a good basis for
browser to webserver communications already.

The ServerConnection should have some mechanisms in the protocol that
allows transmission of events from the server and to the server - as
well as sending variables/structures. Example:

var data = { name: Frode Børli, address: Norway }
document.serverConnection.send(data)

Also the client can add arbitrary event listeners to the
serverConnection object:

document.serverConnection.onwhatever = function(message) {
alert(message.city); }

The server should be able to listen to all DOM events also.

 I find the SessionID header a very good idea though.What are the
 thoughts on that?

 I'm sorry if that has already been discussed, but if we use HTTP, why
 can't we use the Access Control spec as an opt in mechanism that is
 a little easier to implement than DNS? If you modify the behaviour a
 little, you could even use it against DDOS attacks:

Without DNS records (or an alternative implementation) i think the
TCPConnection and the ServerConnection object should be restricted by
the same rules as XMLHttpRequest.

 Counter suggestion: When a WebSocket objects attempts to connect,
 perform Access Control checks the way you would for POST requests.
 If the check fails and if the server response contains an
 Access-Control-Max-Age header, agents must immediately close the
 connection and must not open a connection to that resource again (or,
 if Access-Control-Policy-Path is present, to any resource specified)
 until the specified time has elapsed.
 That way, administrators that are hit by a DDOS can simply put

By securing with DNS records, administrators will not have to do
anything to prevent ddos, as it can't happen. Without DNS records, the
script is allowed only to connect to the same server that it was
fetched from. (Same as Java applets, Flash applets and XMLHttpRequest)

 Access-Control: allow * exclude evilsite.example.com
 Access-Control-Max-Age: 86400
 Access-Control-Policy-Path: /

I think the idea is good, but I do not like the exact implementation.
I think the server should be able to see which script is initiating
the connection (header sent from the client), and then the server can
respond with a HTTP 401 Access Denied. No need to specify anything
more. No need to specify which hosts that are allowed, since the
script can decide that on a number of parameters (like IP-address
etc).


Re: [whatwg] TCPConnection feedback

2008-06-19 Thread Frode Børli
 able to use any method it likes to differentiate its services. Even URI
 addressing is silly since again the application may have no concept of
 paths or queries. It is simply a service running on a port. The only
 valid use case for all this added complexity is proxying but nobody has
 tested yet whether proxies will handle this (short of enabling encryption,
 and even that is untested).

I think we should have both a pure TCPSocket, and also a ServerSocket
that keeps the same connection as the original document was downloaded
from. The ServerSocket will make it very easy for web developers to
work with, since the ServerSocket object will be available both from
the server side and the client side while the page is being generated.
I am posting a separate proposal that describes my idea soon.

 Actually, I've already tested this protocol against some typical forward
 proxy setups and it hasn't caused any problems so far.

Could you test keeping the same connection as the webpage was fetched
from, open? So that when the server script responds with its HTML-code
- the connection is not closed, but used for kept alive for two way
communications?

This gives the following benefits:

The script on the server decides if the connection should be closed or
kept open. (Protection against DDOS attacks)

This allows implementing server side listening to client side events,
and vice versa. If this works, then the XMLHttpRequest object could be
updated to allow two way communications in exactly the same way.

Also, by adding a SessionID header sent from the client (instead of
storing session ids in cookies), the web server could transparently
rematch any client with its corresponding server side process in case
of disconnect.

 I'm thinking here that this proposal is basically rewriting the CGI
 protocol (web server handing off managed request to custom scripts) with the
 ONLY difference being the asynchronous nature of the request. Perhaps more
 consideration might be given to how the CGI/HTTP protocols might be updated
 to allow async communication.
 Rewriting the HTTP spec is not feasible and I'm not even convinced its a
 good idea. HTTP has always been request/response so it would make a lot more
 sense to simply use a new protocol then confuse millions of
 developers/administrators who thought they understood HTTP.

The HTTP spec has these features already:

1: Header: Connection: Keep-Alive
2: Status: HTTP 101 Switching Protocol

No need to rewrite the HTTP spec at all probably.

 Having said that I still see a very strong use case for low-level
 client-side TCP and UDP. There are ways to manage the security risks that
 require further investigation. Even if it must be kept same-domain that is
 better than creating a new protocol that won't work with existing services.
 Even if that sounds like a feature - it isn't. There are better ways to
 handle access-control for non-WebConnection devices than sending garbage to
 the port.
 If we put the access control in anything but the protocol it means that we
 are relying on an external service for security, so it would have to be
 something that is completely locked down. I don't really see what the
 mechanism would be. Can you propose a method for doing this so as to allow
 raw tcp connections without security complications?

TCPConnections are only allowed to the server where the script was
downloaded from (same as Flash and Java applets). A DNS TXT record can
create a white list of servers whose scripts can connect. Also the
TCPConnection possibly should be allowed to connect to local network
resources, after a security warning - but only if the server has a
proper HTTPS certificate.

 It's more harmful because an img tag (to my knowledge) cannot be used to
 brute-force access, whereas a socket connection could. With the focus on
 DDOS it is important to remember that these sockets will enable full
 read/write access to arbitrary services whereas existing methods can only
 write once per connection and generally not do anything useful with the
 response.
 What do you mean by brute-force access, and how could the proposed protocol
 be used to do it. Can you provide an example?

With the security measures I suggest above, there is no need for
protection against brute force attacks. Most developers only use one
server per site, and those that have multiple servers will certainly
be able to add a TXT-record to the DNS.


Re: [whatwg] What should the value attribute be for multi-file upload controls in WF2?

2008-06-19 Thread Frode Børli
I think it should be a select box containing each file name and
perhaps an icon, and when you select a file - it asks you if you want
to remove the file from the upload queue.

Frode

2008/6/19 Adele Peterson [EMAIL PROTECTED]:
 Hi all,

 I'm looking at the Web Forms 2 specification for the multi-file upload
 control that uses the min/max attributes.  When multiple files are selected,
 its unclear what the value attribute should contain.  It could contain just
 the first filename, or a comma separated list of all of the filenames.  I
 think it will be useful though to add something about this in the
 specification for consistency.

 Thanks,
Adele




-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] What should the value attribute be for multi-file upload controls in WF2?

2008-06-19 Thread Frode Børli
Sorry for misunderstanding. Ofcourse it is up to the user agent to
decide the appearance. If the value attribute should be accessible
from script, then I would wish it was an array when accessed from
script. If it must be a string, then I think it the control itself
should contain multiple input type=file elements, each with a single
value - accessible trough DOM.

Separating by comma is not good enough, as file names can contain comma.

Frode

2008/6/20 Adele Peterson [EMAIL PROTECTED]:
 That's a suggestion for the design of the control, but I was asking
 specifically about the value attribute, which can be accessed from script as
 a string.

 - Adele

 On Jun 19, 2008, at 2:56 PM, Frode Børli wrote:

 I think it should be a select box containing each file name and
 perhaps an icon, and when you select a file - it asks you if you want
 to remove the file from the upload queue.

 Frode

 2008/6/19 Adele Peterson [EMAIL PROTECTED]:

 Hi all,

 I'm looking at the Web Forms 2 specification for the multi-file upload
 control that uses the min/max attributes.  When multiple files are
 selected,
 its unclear what the value attribute should contain.  It could contain
 just
 the first filename, or a comma separated list of all of the filenames.  I
 think it will be useful though to add something about this in the
 specification for consistency.

 Thanks,
  Adele




 --
 Best regards / Med vennlig hilsen
 Frode Børli
 Seria.no

 Mobile:
 +47 406 16 637
 Company:
 +47 216 90 000
 Fax:
 +47 216 91 000


 Think about the environment. Do not print this e-mail unless you really
 need to.

 Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.





-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] TCPConnection feedback

2008-06-19 Thread Frode Børli
 I think we should have both a pure TCPSocket, and also a ServerSocket
 that keeps the same connection as the original document was downloaded
 from. The ServerSocket will make it very easy for web developers to
 work with, since the ServerSocket object will be available both from
 the server side and the client side while the page is being generated.
 I am posting a separate proposal that describes my idea soon.

 I don't see the benefit of making sure that its the same connection that the
 page was generated from.

It does not have to be exactly the same connection, but I think it
should be handled by the web server because then there is no need to
think about transferring state information between for example a PHP
script and a WebSocketServer. It would be almost like creating a
desktop application. Simply because it would be easy for
webdevelopers: Sample PHP script:

There is probably a better approach to implementing this in php, but
its just a concept:

   input id='test' type='button';
   script type='text/javascript'
  // when the button is clicked, raise the test_click event
handler on the server.
  document.getElementById('test').addEventListener('click',
document.serverSocket.createEventHandler('test_click');
  // when the server raises the message event, alert the message
  document.serverSocket.addEventListener('message', alert);
   /script
?php
// magic PHP method that is called whenever a client side event is
sent to the server
function __serverSocketEvent($name, $event)
{
if($name == 'test_click')
   server_socket_event(message, You clicked the button);
}
?
 

  If you establish a Connection: Keep-Alive with the proxy server, it will
 leave the connection open to you, but that doesn't mean that it will leave
 the connection open to the back end server as the Connection header is a
 single-hop header.

So it is not possible at all? There are no mechanisms in HTTP and
proxy servers that facilitates keeping the connection alive all the
way trough to the web server?


If a Session ID (or perhaps a Request ID) is added to the headers then
it is possible to create server side logic that makes it easier for
web developers. When session ids are sent trough cookies, web servers
and proxy servers have no way to identiy a session (since only the
script knows which cookie is the session id). The SessionID header
could be used by load balancers and more - and it could also be used
by for example IIS/Apache to connect a secondary socket to the script
that created the page (and ultimately achieving what I want).

 The script on the server decides if the connection should be closed or
 kept open. (Protection against DDOS attacks)
 With the proposed spec, the server can close the connection at any point.
I stated it as a benefit in the context of the web server handling the
requests. An image would close the connection immediately, but a
script could decide to keep it open. All servers can ofcourse close
any connection any time.

 This allows implementing server side listening to client side events,
 and vice versa. If this works, then the XMLHttpRequest object could be
 updated to allow two way communications in exactly the same way.

 The previously proposed protocol already allows the server side listening to
 client side events, and vice versa. Rather or not to put that in the
 XMLHttpRequest interface is another issue. I think making XHR bi-directional
 is a bad idea because its confusing. Better to use a brand new api, like
 WebSocket.

If the implementation works as I tried to examplify in the PHP script
above; a document.serverSocket object is available, then the xhr
object should also have a .serverSocket object.

document.serverSocket.addEventListener(...)
xhr.serverSocket.addEventListener(...)

I am sure this can be achieved regardless of the protocol.

 Also, by adding a SessionID header sent from the client (instead of
 storing session ids in cookies), the web server could transparently
 rematch any client with its corresponding server side process in case
 of disconnect.
 Isn't that what cookies are supposed to do?  Regardless, it sounds like an
 application-level concern that should be layered on top of the protocol.

One important advantage is that javascript.cookie can be used for
hijacking sessions by sending the cookie trough for example an
img-tag. If javascript can't access the SessionID then sessions cant
be hijacked through XSS attacks.

Also I think load balancers and web servers and other applications
that do not have intimate knowledge about the web application should
be able to pair WebSocket connections with the actual http request.
How else can load balancers be created if they have to load balance
both pages and websockets to the same webserver? The load balancer
does not know what part of the cookie identifies the session.

I am sure that some clever people will find other uses if the session
id and request id is available for each request 

Re: [whatwg] TCPConnection feedback

2008-06-18 Thread Frode Børli
 without informing the user. This would allow a popular page (say a facebook
 profile or banner ad) to perform massive DOS against web servers using
 visitors browsers without any noticeable feedback (though I guess this is
 also true of current HTTPXMLRequestObjects).

XMLHttpRequest only allows connections to the origin server ip of the
script that created the object. If a TCPConnection is supposed to be
able to connect to other services, then some sort of mechanism must be
implemented so that the targeted web server must perform some sort of
approval. The method of approval must be engineered in such a way that
approval process itself cannot be the target of the dos attack. I can
imagine something implemented on the DNS servers and then some digital
signing of the script using public/private key certificates.

  I propose that there be requirements that limit the amount and type of data
 a client can send before receiving a valid server response.

If the client must send information trough the TCPConnection
initially, then we effectively stop existing servers such as
IRC-servers from being able to accept connections without needing a
rewrite.

  There should also be a recommendation that UAs display some form of status
 feedback to indicate a background connection is occurring.
Agree.

  HIXIE.3) No existing SMTP server (or any non-TCPConnection server) is
 going
  to send back the appropriate handshake response.

If TCPConnection is limited to connect only to the origin server, or
servers validated by certificates, then this will never be a problem.
If we take active measures against STMP, then we should do the same
against POP3, IMAP etc as well.

  It is always possible that non-http services are running on port 80. One
 logical reason would be as a workaround for strict firewalls. So the main
 defense against abuse is not the port number but the handshake. The original
 TCP Connection spec required the client to send only Hello\n and the
 server to send only Welcome\n. The new proposal complicates things since
 the server/proxy could send any valid HTTP headers and it would be up to the
 UA to determine their validity. Since the script author can also inject URIs
 into the handshake this becomes a potential flaw. Consider the code:

The protocol should not require any data (not even hello - it should
function as an ordinary TCPConnection similar to implementations in
java, c# or any other major programming language. If not, it should be
called something else - as it is not a TCP connection.


[whatwg] Suggestion of an alternative TCPConnection implementation

2008-06-18 Thread Frode Børli
 I think a major problem with raw TCP connections is that they would be
 the nightmare of every administrator. If web pages could use every
 sort of homebrew protocol on all possible ports, how could you still
 sensibly configure a firewall without the danger of accidentally
 disabling mary sue grandmother's web application?

I dont think so, as long as the web page could only connect to its origin
server. I am certain that this problem was discussed when Java applets
were created also.

Web pages should only be allowed to access other servers when the
script has been digitally signed, and when the user has agreed to
giving the script elevated privileges - or there should be a
certificate on the origin server which is checked against DNS records
for each server that the script attempts to connect to.

 Also keep in mind the issue list Ian brought up in the other mail.
 Things like URI based adressing and virtual hosting would not be
 possible with raw TCP. That would make this feature a lot less useable
 for authors that do not have full access over their server, like in
 shared hosting situations, for example.

Hmm.. There are good arguments both ways. I would like both please :)

So what we want is a http based protocol which allow the client to
continue communicating with the script that handles the initial
request. I believe that a great way to implement this would be to
extend the http protocol (and by using the Connection: Keep-Alive
header).

It should be the script on the server that decides if the connection
is persistent. This will avoid most problems with cross domain
connections, i believe. Lets imagine two php-scripts on a web server:

/index.php (PHP script)
/persistent.pphp (persistent PHP script)

If the user types in the address http://host.com/persistent.pphp -
then this use case is followed:

1. Client sends GET /persistent.pphp and its headers (including domain
name and cookies etc). After all headers are sent, it expects a
standard http compliant response.
2. Server checks the Accept: header for HTML 5 support.
3 (alternative flow): If no support is found in the Accept headers, a
HTTP 406 Not Acceptable header is sent with an error message saying
that a HTML 5 browser is required.
4 (alternative flow). Server checks the (new) SessionID header if it
should reconnect the client to an existing server side instance.
4. Server side script processes the request and may reply with a
complete html page (or with simply a Hello message - it is the server
side script that decides). Server must send Connection: Keep-Alive and
Connection-Type: Persistent headers.
4. The browser renders the response - but a singleton object is
magically available from javascript; document.defaultHTMLSocket. This
object allows the client to continue communicating with the script
that generated the page by sending either serialized data in the same
form as GET/POST data or single text lines.

Other use case: User visits /index.php - which will connect to
/persistent.pphp using javascript.

1. Javascript: mySocket = new HTTPSocket(/persistent.php);
2. Exactly the same use case as the previous is followed, except that
the HTTPSocket-object is returned. The initial data sent by the server
must be read using the read() method of the HTTPSocket object.


Of course, I have not had time to validate that everything I have
suggested can be used and I would like more people to review this
suggestion - but I think it looks very viable at first glance.


I see one problem, and that is if the connection is lost (for example
because of a proxy server):

This could be fixed creating a new header ment for storing a client
session id. If we standardize on that, the web server could
automatically map the client back to the correct instance of the
server application and neither the client, nor the server application
need to know that the connection was lost.


Any feedback will be appreciated.

 Couldn't this already be done today, though? You can already today
 connect to an arbitrary server on an arbitrary port using forms,
 img, script src= and all other references that cannot be
 cross-domain protected for backwards compatibillity reasons. The whole
 hotlinking issue is basically the result of that.
 How would WebSocket connections be more harmful than something like

 setInterval(function(){
   var img = new Image();
   img.src = http://victim.example.com/; + generateLongRandomString();
 }, 1000);

 for example would?


Yes, that could be done - but I think that it would be a lot more
painful for the server if the connection was made to some port, and
kept open. Handling a request for a non-existing url can be finished
in microseconds, but if the client just opens a port without being
disconnected, then the server will quickly be overloaded overloaded by
too many incoming connections.
--
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about

[whatwg] Restricting style inheritance

2008-06-17 Thread Frode Børli
I am unsure if this applies to HTML (or rather CSS). From the archives I see
that Dean Edwards proposed some reset/reset element that was supposed to
reset styles to the page default style. I have another proposal

div style='inherit: nothing'/div

This would effectively make everything inside the div have the browser
default stylesheet. Other values for the inherit css style could be:

div style='inherit: font-weight font-family font-size;'/div

e.g. any css attribute on a space separated list. This list should also
allow short hand attributes such as background and font but be expanded.
Note that this is a white list approach - which I think is far better than
the black list approach that we need to use today: style='line-height: 10px;
font-family: Arial' etc is a black list and not very maintainable.

-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no
-Think about the environment. Do not print this e-mail unless you really
need to.
-Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] Sandboxing to accommodate user generated content.

2008-06-17 Thread Frode Børli
I have been reading up on past discussions on sandboxing content, and
I feel that it is generally agreed on that there should be some
mechanism for marking content as user generated. The discussion
mainly appears to be focused on implementation. Please read my
implementation notes at the end of this message on how we can include
this function safely for both HTML 4 and HTML 5 browsers, and still
allow HTML 4 browsers to function properly.


My main arguments for having this feature (in one form or another) in
the browser is:

- It is future proof. Changes to browsers (for example adding
expression support to css) will never again require old sanitizers to
be updated.
- It does not require much skill and effort from the web developer to
safely sanitize user content.
- Security bugs are fixed by browser vendors, and not by each web developer.


In the discussions I find that backward compatability is absolutely
the most important issue. Second is that it must be easy for web
developers to use the features.

The suggested solution of using an attribute on an iframe element
for storing the user generated content has several problems;

1: The use of src= as a fallback means that style information will be
lost and stylesheets must be loaded again.

2: The use of src= yields problems with iframe heights (since the
src-url must be hosted on another server javascript cannot fix this)
and HTML 4 browsers have no other method of adjusting the iframe
height according to the content.

3: If you have a page that lists 60 comments on a blog, then the user
agent would have to contact the server 60 times to fetch each comment.
This again means that perl/php scripts have to be invoked 60 times for
one page view - that is 61 separate database connections and session
initializations.

4: For the fallback method of using src= for HTML 4 browsers to
actually work, the fallback documents must be hosted on a separate
domain name. This again means that a website using HTTPS must purchase
and maintain two certificates.

I do not believe this solution will ever be used.


My solution:

If we add a new element htmlarea/htmlarea, old browsers will run
scripts, while new browsers will stop scripts and this is a major
problem.

If HTML 5 browsers require everything between htmlarea/htmlarea to
be html entity escaped, that is  and  must be replaced with lt; and
gt; respectively. If this is not done, HTML 5 browsers will issue a
severe warning and refuse to display the page. Developers will quickly
learn.

HTML 4 browsers will never run scripts (since it will only see plain
text). HTML 5 browsers will display rich text. It would be completely
secure for both HTML 4 and HTML 5 browsers.

A simple Javascript could clean up the HTML markup for HTML 4 browsers..


  I believe the idea to deal with this is to add another attribute to 
 iframe, besides sandbox= and seamless= we already have for sandboxing. 
 This attribute, doc=, would take a string of markup where you would only 
 need to escape the quotation character used (so either ' or ). The fallback 
 for legacy user agents would be the src= attribute.

-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


Re: [whatwg] Sandboxing to accommodate user generated content.

2008-06-17 Thread Frode Børli
 I've also been having side discussions with a few people regarding the
 ability for a website owner to mark sections as data rather than code
 (where everything lies now).
 Your htmlarea tag idea is a good one (maybe change the tag to data
 just a nitpick) however you don't address the use case of the
 following

 data

 user supplied input

 /data


I have considered your idea (below) but found that it would not allow
efficient server side caching, which often is needed. If instead all
html inside data/data must be escaped like this:

data

lt;user supplied inputgt;

/data

Then this will be secure both for HTML 4 and HTML 5 browsers. HTML 4
browsers will display html, while HTML 5 browsers will display
correctly formatted code. A simple javascript like this (untested)
would make the data tags readable for HTML 4 browsers:

var els = document.getElementsByTagName(DATA);
for(e in els) els[e].innerHTML =
els[e].innerHTML.replace(/#91;^#93;*/g, ).replace(/\n/g,
br);


A problem with this approach is that developers might forget to escape
tags, therefore I think browsers should display a security warning
message if the character  or  is encountered inside a data tag.


 If the user injects /data then game over.  A solution I discovered
 for this problem (others I'm sure as well that aren't speaking)
 borrows from the defenses of cross-site request forgery (CSRF) where a
 non guessable token is used. Take the following example

 data id=GUID
 /data
 /data id=GUID

 GUID would be a temporary GUID value such as
 'F9968C5E-CEB2-4faa-B6BF-329BF39FA1E4' that would be tied to the user
 session. An attacker would be unable to break out of a data tag due
 to the fact that they couldn't guess the closing ID value. This is

*snip*


  I believe the idea to deal with this is to add another attribute to 
 iframe, besides sandbox= and seamless= we already have for 
 sandboxing. This attribute, doc=, would take
 a string of markup where you would only need to escape the quotation 
 character used (so either ' or ). The fallback for legacy user agents 
 would be the src= attribute.

 To take this a step further there may be situations where user content
 is reflected inside of HTML tags in the following manner such as
 'a href=user generated valuefoo/a'. For situations like this an
 additional attribute (along the lines of what you propose) could be
 added to this tag (or any tag for that matter)
 to instruct the browser that no script/html can execute.

 a sandbox=true  href=javascript:alert(document.cookie)asd/a
 a sandbox=true href=injected valueasd/a  (injected value  
 onload=javascript:alert('wooot') foo=bar)


I like this better than a separate tag yes. div sandbox=1/div or
div content=untrusted/div


Re: [whatwg] Sandboxing to accommodate user generated content.

2008-06-17 Thread Frode Børli
 elements on this page,
do you want to display them?).
2. Mixing secure and insecure communications makes having the secure
channel pointless.
3. It is extremely dangerous to assume that nobody in the future will
ever need to have secure communications with user generated content.



Best regards, Frode Børli - Seria.no


Re: [whatwg] Sandboxing to accommodate user generated content.

2008-06-17 Thread Frode Børli
 I have been reading up on past discussions on sandboxing content, and
 I feel that it is generally agreed on that there should be some
 mechanism for marking content as user generated. The discussion
 mainly appears to be focused on implementation. Please read my
 implementation notes at the end of this message on how we can include
 this function safely for both HTML 4 and HTML 5 browsers, and still
 allow HTML 4 browsers to function properly.

 My main arguments for having this feature (in one form or another) in
 the browser is:

 - It is future proof. Changes to browsers (for example adding
 expression support to css) will never again require old sanitizers to
 be updated.

 If the sanitiser uses a whitelist based approach that forbids everything by
 default, and then only allows known elements and attributes; and in the case
 of the style attribute, known properties and values that are safe, then that
 would also be the case.

I have written a sanitizer for html and it is very difficult -
especially since browsers have undocumented bugs in their parsing.

Example: div colspan=amp;
style=font-family#61;expression#40;alert#40quot;hackedquot#41#41
colspan=amp;Red/div

The proof that sanitazing HTML is difficult is the fact that no major
site even attempts it. Even wikipedia use some obscure wiki-language,
instead of implementing a wysiwyg editor.

 Note that sandboxing doesn't entirely remove the need for sanitising user
 generated content on the server, it's just an extra line of defence in case
 something slips through.

Ofcourse. However, the sandbox feature in browser will be fail safe if
user generated content is escaped with lt; and gt; before being sent
to the browser - as long as the browser does not have bugs of course.

 The suggested solution of using an attribute on an iframe element
 for storing the user generated content has several problems;
 1: The use of src= as a fallback means that style information will be
 lost and stylesheets must be loaded again.
 This is not a major problem.  If it uses the same stylesheet, which can be
 cached by the browser, then at worst it results in a 304 Not Modified
 response.

Many small rivers...

 2: The use of src= yields problems with iframe heights (since the
 src-url must be hosted on another server javascript cannot fix this)
 and HTML 4 browsers have no other method of adjusting the iframe
 height according to the content.
 In recent browsers that support cross-document messaging (Opera 9, Safari 3,
 Firefox 3 and IE 8), you could include a script within the comment page that
 calculates its own height and sends a message to the parent page with the
 info.  In older browsers, just set the height to a reasonable minimum and
 let the user scroll.  Sure, it's not perfect, but it's called graceul
 degradation.

Much more difficult to implement than a sandbox/sandbox mechanism
- and I do not see the point giving more work to web developers when
it could be fixed so easily.

 3: If you have a page that lists 60 comments on a blog, then the user
 agent would have to contact the server 60 times to fetch each comment.
 This again means that perl/php scripts have to be invoked 60 times for
 one page view - that is 61 separate database connections and session
 initializations.
 You could always concatenate all of the comments into a single file,
 reducing it down to 1 request.

No you could not, if you for example want people to report comments or
give them votes - which in the Web 2.0 world requires scripting.

 4: For the fallback method of using src= for HTML 4 browsers to
 actually work, the fallback documents must be hosted on a separate
 domain name. This again means that a website using HTTPS must purchase
 and maintain two certificates.
 I don't see that as a show stopper.

Well, I am not going to argue anymore. I have not heard anybody talk
in favour of a sandbox mechanism here or contributing something
constructive. Only feedback has been that you could do it with
iframes, and if it looks ugly with HTML 4 browsers, then that is only
graceful degradation, so it is okay. Maybe the future is Flash and
Silverlight afterall. We'll see.

 If HTML 5 browsers require everything between htmlarea/htmlarea to
 be html entity escaped, that is  and  must be replaced with lt; and
 gt; respectively. If this is not done, HTML 5 browsers will issue a
 severe warning and refuse to display the page. Developers will quickly
 learn.

 Draconian error handling is something we really want to avoid, particularly
 when the such an error can be triggered by failing to handle user generated
 content properly.

I see that argument. Maybe you have a suggestion to what should happen
if unescaped HTML is encountered then?

 HTML 4 browsers will never run scripts (since it will only see plain
 text). HTML 5 browsers will display rich text. It would be completely
 secure for both HTML 4 and HTML 5 browsers.

 A simple Javascript could clean up the HTML markup for HTML 4 

[whatwg] Sandboxing to accommodate user generated content.

2008-06-16 Thread Frode Børli
Hi! I am a new member of this mailing list, and I wish to contribute with a
couple of specific requirements that I believe should be discussed and
perhaps implemented in the final specification. I am unsure if this is the
correct place to post my ideas (or if my ideas are even new), but if it is
not, then I am sure somebody will instruct me. :) One person told me that
the specification was finished and no new features would be added from now
on - but hopefully that is not true.


The challenge:

More and more websites have features where users can contribute with user
generated content - often in the form of audio, video, images
or wiki-articles. An older type of content contribution is normal text such
as posts in a discussion forum, a mailing list such as this and comments on
blog articles.

A major challenge for many web developers is validating untrusted content
such as the message body of a blog comment. Unless the developer has a
flawless and future proof algorithm for ensuring that the message body does
not contain any script, web developers have to resort to text only - or
bbCode-style markup languages to allow users to post text content with
richer formatting. If the developer wants to enable rich formatting using
bbCode, it also needs fairly advanced methods of ensuring that no scripts
are executed. Consider this bbCode example:
[img]some_image.jpg'onmouseover=maliciousScript()[/img]. The bbCode parser
must ensure that there is absolutely no method of injecting scripts in user
posts - and that is very difficult when at the same time there exists
parsing errors in browsers. The example could easily be validating by not
allowing apostrophes or quotation marks in urls - but then we have multiple
entities that could be used: apos; or #39;. To make matters worse, some
browsers parse #39 which is an incomplete html entity and all these
variations must be considered by the bbCode parser author.

Another problem which makes future proofing this type of security is that
standards evolve. A few years ago you could safely allow users to apply
css-styles to tags. Example bbCode tag [color=blue]Blue text[/color] would
be translated to span style='color: blue'Blue text/span. In this example
an exploit could be [color=expression(maliciousCode())]Text[/color]. When
the algorithm was made, it was considered secure, since no script could ever
be executed inside a style attribute. With the invention of expressions and
behaviours etc the knowledge required by web developers are ever increasing,
and web developers have to review all old code whenever new technologies
emerge - because what once was secure suddenly is not secure anymore.


One solution:

htmlareaUser generated content/htmlarea


No scripts would ever be allowed to be executed inside this tag. Malicious
users could potentially submit /htmlarea unsafe content htmlarea and
get around this. There are as I can see it two solutions to this:

User generated content inside the tag must be escaped using html entities
(but still rendered as html by the user agent), or the author must prevent
users from submitting the string /htmlarea and all possible variations
of the tag.

If the first solution is used, then browsers should display a
strong security warning if unescaped content is seen between htmlarea-tags
on a website (to educated web developers).


A sidenote: The tag name I chose is based on the textarea-tags which
should also be entity escaped to prevent users from inserting the text
/textarea.  This currently breaks a lot of web pages - so perhaps a strong
security warning is in place if unescaped content is found after the
textarea start tag also?


-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


[whatwg] Sandboxing to accommodate user generated content.

2008-06-16 Thread Frode Børli
 Hi! I am a new member of this mailing list, and I wish to contribute with a
couple of specific requirements that I believe should be discussed and
perhaps implemented in the final specification. I am unsure if this is the
correct place to post my ideas (or if my ideas are even new), but if it is
not, then I am sure somebody will instruct me. :) One person told me that
the specification was finished and no new features would be added from now
on - but hopefully that is not true.


The challenge:

More and more websites have features where users can contribute with user
generated content - often in the form of audio, video, images
or wiki-articles. An older type of content contribution is normal text such
as posts in a discussion forum, a mailing list such as this and comments on
blog articles.

A major challenge for many web developers is validating untrusted content
such as the message body of a blog comment. Unless the developer has a
flawless and future proof algorithm for ensuring that the message body does
not contain any script, web developers have to resort to text only - or
bbCode-style markup languages to allow users to post text content with
richer formatting. If the developer wants to enable rich formatting using
bbCode, it also needs fairly advanced methods of ensuring that no scripts
are executed. Consider this bbCode example:
[img]some_image.jpg'onmouseover=maliciousScript()[/img]. The bbCode parser
must ensure that there is absolutely no method of injecting scripts in user
posts - and that is very difficult when at the same time there exists
parsing errors in browsers. The example could easily be validating by not
allowing apostrophes or quotation marks in urls - but then we have multiple
entities that could be used: apos; or #39;. To make matters worse, some
browsers parse #39 which is an incomplete html entity and all these
variations must be considered by the bbCode parser author.

Another problem which makes future proofing this type of security is that
standards evolve. A few years ago you could safely allow users to apply
css-styles to tags. Example bbCode tag [color=blue]Blue text[/color] would
be translated to span style='color: blue'Blue text/span. In this example
an exploit could be [color=expression(maliciousCode())]Text[/color]. When
the algorithm was made, it was considered secure, since no script could ever
be executed inside a style attribute. With the invention of expressions and
behaviours etc the knowledge required by web developers are ever increasing,
and web developers have to review all old code whenever new technologies
emerge - because what once was secure suddenly is not secure anymore.


One solution:

htmlareaUser generated content/htmlarea


No scripts would ever be allowed to be executed inside this tag. Malicious
users could potentially submit /htmlarea unsafe content htmlarea and
get around this. There are as I can see it two solutions to this:

User generated content inside the tag must be escaped using html entities
(but still rendered as html by the user agent), or the author must prevent
users from submitting the string /htmlarea and all possible variations
of the tag.

If the first solution is used, then browsers should display a
strong security warning if unescaped content is seen between htmlarea-tags
on a website (to educated web developers).


A sidenote: The tag name I chose is based on the textarea-tags which
should also be entity escaped to prevent users from inserting the text
/textarea.  This currently breaks a lot of web pages - so perhaps a strong
security warning is in place if unescaped content is found after the
textarea start tag also?


-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.



-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.