Thank you for writing this up Harmeet. Before getting into the
details it
is important that you understand that you are proposing to change
something,
or in fact basically get rid of something, that was discussed in a
lot of
detail and that has seen effort from quite a few people. In general
when
recommending to change something it's a good idea to understand
where that
came from and why things are done the way they are. Based on what
you've
written in this and other messages it does not seem that you have
done this
research. I might be wrong, and if so I apologize, but in order to
continue
with a discussion like this it is important that you know the
background and
understand the problem and the reasons for the solutions.
There is a good thread that discusses most of this, and it is
available
here:
http://www.nabble.com/Security-Issues-td21622188.html
On Jun 22, 2009, at 11:31 AM, Harmeet Bedi wrote:
A issue(https://issues.apache.org/jira/browse/OFBIZ-2645) is that
default
value of allow-html is very restrictive.
This is as per start with most constrained best practice in security
Default is allow-html="none"
It not only does not allow html but it also does not allow simple
text
like "Tom's age is likely > Paul's age". '>' breaks.
I agree this could be improved, but the trick is how do we tell the
difference? One interesting page that shows just how difficult it
is to
recognize HTML is this one which is oriented around finding hacks
related to
this very problem in various browsers:
http://ha.ckers.org/xss.html
What you have mentioned is definitely a weakness in the current
code, which
is admittedly fairly simple, and an opportunity to improve that
code. I
don't think it would be a good idea to change the overall approach
and throw
away the attempt to filter incoming text, especially text coming in
from
untrusted sources.
Around the time this was written I asked for recommendations from
anyone on
the mailing list that might be more familiar with this topic,
especially as
it relates to what a browser will or will not interpret as markup. My
limited knowledge on the topic is that you pretty much have to have
something that looks like a tag, in other words starts with a less
than or
open angle bracket and has some sort of word after it, in order for
the
browser to consider it markup and try to interpret it.
One thing that might work is to allow a > if there is no <. I'm not
sure
about the other way around because if there is otherwise valid HTML a
browser might not care about a missing > to close a tag. It may
also be
valid to allow a < if there is a space after it, but I'm not sure
about that
either.
Please keep in mind that the reason for this is to not allow abuse
of a
production system. If that is not important to you or to your
clients then
you can certainly disable it globally. However, I'm guessing that's
not what
you would want to do and the best approach for all of this is to
simply
improve the code so that it allows as much as possible while still
protecting against the security threat it is meant to.
allow-html="safe" also does not seem to work well. It does not
allow well
formed html.
Could you be more specific? Depending on what you're seeing there
may very
well be a bug with the safe HTML code, or just a need to change the
antisamy-esapi.xml configuration file for it. In any case, it would
be good
to find out more about what you mean by this because it sounds like
a bug.
Here is a proposal:
HTML and descriptive text with characters like '>' should be
allowable
whenever there is a description text input by user.
Change services that deal with description/comments/reason/
noteInfo fields
to have allow-html="any"
An example of this is 'updateWorkEffortNote' service. it could
change from
<service name="updateWorkEffortNote" engine="simple"
location="component://workeffort/script/org/ofbiz/workeffort/
workeffort/WorkEffortSimpleServices.xml"
invoke="updateWorkEffortNote" auth="true">
<description>Update a WorkEffort Note</description>
<attribute name="workEffortId" type="String" mode="IN"
optional="false"/>
<attribute name="noteId" type="String" mode="IN"
optional="false"/>
<attribute name="internalNote" type="String" mode="IN"
optional="false"/>
<attribute name="noteInfo" type="String" mode="IN"
optional="true"/>
</service>
to
<service name="updateWorkEffortNote" engine="simple"
location="component://workeffort/script/org/ofbiz/workeffort/
workeffort/WorkEffortSimpleServices.xml"
invoke="updateWorkEffortNote" auth="true">
<description>Update a WorkEffort Note</description>
<attribute name="workEffortId" type="String" mode="IN"
optional="false"/>
<attribute name="noteId" type="String" mode="IN"
optional="false"/>
<attribute name="internalNote" type="String" mode="IN"
optional="false"/>
<attribute name="noteInfo" type="String" mode="IN"
optional="true"
allow-html="any"/>
</service>
if this seems acceptable, i can send patches with services for
review and
commits.
I mentioned some things related to this in my comments above, but in
general no, my opinion is that we should not open up this security
hole in
so many places by default. I would be very interested to hear what
others
have to say and how others would like to see this working, but it
may be
that they are not talking so much because this has already been
discussed.
In general it is not safe to assume that output filtering will
always take
care of this problem, and so when ever text comes from an un-
trusted source,
or a potentially un-trusted source, we should filter the input to
avoid the
problem right there.
I guess this goes back to what I said at the beginning of this
message. It
would be very valuable to put this in the context of what has
already been
researched and discussed. What you have written here would be a lot
more
meaningful and a lot easier to discuss if you referred back to the
original
discussion and decisions that lead to the functionality you are
looking at.
What I mean by that is that anything that has been discussed and
decided on
can certainly be changed, but not without referring back to the
original
discussion and decisions.
-David