Thank you for writing this up Harmeet. Before getting into the details
it is important that you understand that you are proposing to change
something, or in fact basically get rid of something, that was
discussed in a lot of detail and that has seen effort from quite a few
people. In general when recommending to change something it's a good
idea to understand where that came from and why things are done the
way they are. Based on what you've written in this and other messages
it does not seem that you have done this research. I might be wrong,
and if so I apologize, but in order to continue with a discussion like
this it is important that you know the background and understand the
problem and the reasons for the solutions.
There is a good thread that discusses most of this, and it is
available here:
http://www.nabble.com/Security-Issues-td21622188.html
On Jun 22, 2009, at 11:31 AM, Harmeet Bedi wrote:
A issue(https://issues.apache.org/jira/browse/OFBIZ-2645) is that
default value of allow-html is very restrictive.
This is as per start with most constrained best practice in security
Default is allow-html="none"
It not only does not allow html but it also does not allow simple
text like "Tom's age is likely > Paul's age". '>' breaks.
I agree this could be improved, but the trick is how do we tell the
difference? One interesting page that shows just how difficult it is
to recognize HTML is this one which is oriented around finding hacks
related to this very problem in various browsers:
http://ha.ckers.org/xss.html
What you have mentioned is definitely a weakness in the current code,
which is admittedly fairly simple, and an opportunity to improve that
code. I don't think it would be a good idea to change the overall
approach and throw away the attempt to filter incoming text,
especially text coming in from untrusted sources.
Around the time this was written I asked for recommendations from
anyone on the mailing list that might be more familiar with this
topic, especially as it relates to what a browser will or will not
interpret as markup. My limited knowledge on the topic is that you
pretty much have to have something that looks like a tag, in other
words starts with a less than or open angle bracket and has some sort
of word after it, in order for the browser to consider it markup and
try to interpret it.
One thing that might work is to allow a > if there is no <. I'm not
sure about the other way around because if there is otherwise valid
HTML a browser might not care about a missing > to close a tag. It may
also be valid to allow a < if there is a space after it, but I'm not
sure about that either.
Please keep in mind that the reason for this is to not allow abuse of
a production system. If that is not important to you or to your
clients then you can certainly disable it globally. However, I'm
guessing that's not what you would want to do and the best approach
for all of this is to simply improve the code so that it allows as
much as possible while still protecting against the security threat it
is meant to.
allow-html="safe" also does not seem to work well. It does not allow
well formed html.
Could you be more specific? Depending on what you're seeing there may
very well be a bug with the safe HTML code, or just a need to change
the antisamy-esapi.xml configuration file for it. In any case, it
would be good to find out more about what you mean by this because it
sounds like a bug.
Here is a proposal:
HTML and descriptive text with characters like '>' should be
allowable whenever there is a description text input by user.
Change services that deal with description/comments/reason/noteInfo
fields to have allow-html="any"
An example of this is 'updateWorkEffortNote' service. it could
change from
<service name="updateWorkEffortNote" engine="simple"
location="component://workeffort/script/org/ofbiz/
workeffort/workeffort/WorkEffortSimpleServices.xml"
invoke="updateWorkEffortNote" auth="true">
<description>Update a WorkEffort Note</description>
<attribute name="workEffortId" type="String" mode="IN"
optional="false"/>
<attribute name="noteId" type="String" mode="IN"
optional="false"/>
<attribute name="internalNote" type="String" mode="IN"
optional="false"/>
<attribute name="noteInfo" type="String" mode="IN"
optional="true"/>
</service>
to
<service name="updateWorkEffortNote" engine="simple"
location="component://workeffort/script/org/ofbiz/
workeffort/workeffort/WorkEffortSimpleServices.xml"
invoke="updateWorkEffortNote" auth="true">
<description>Update a WorkEffort Note</description>
<attribute name="workEffortId" type="String" mode="IN"
optional="false"/>
<attribute name="noteId" type="String" mode="IN"
optional="false"/>
<attribute name="internalNote" type="String" mode="IN"
optional="false"/>
<attribute name="noteInfo" type="String" mode="IN"
optional="true" allow-html="any"/>
</service>
if this seems acceptable, i can send patches with services for
review and commits.
I mentioned some things related to this in my comments above, but in
general no, my opinion is that we should not open up this security
hole in so many places by default. I would be very interested to hear
what others have to say and how others would like to see this working,
but it may be that they are not talking so much because this has
already been discussed.
In general it is not safe to assume that output filtering will always
take care of this problem, and so when ever text comes from an un-
trusted source, or a potentially un-trusted source, we should filter
the input to avoid the problem right there.
I guess this goes back to what I said at the beginning of this
message. It would be very valuable to put this in the context of what
has already been researched and discussed. What you have written here
would be a lot more meaningful and a lot easier to discuss if you
referred back to the original discussion and decisions that lead to
the functionality you are looking at. What I mean by that is that
anything that has been discussed and decided on can certainly be
changed, but not without referring back to the original discussion and
decisions.
-David