Btw Microsoft’s macro execution prevention does not protect against formulas in 
CSV. It are different pop-ups (at least in office 2016) which will be shown in 
the formula injection case, especially when using the cmd| mechanism, but it’s 
rather easy to allow it (if administrators don’t prevent it with a group 
policy). It is a bit unfortunate as it tells you, you need to trust the souce, 
who would distrust their ERP or their bank… Not to mention that Google sheets 
is also affected. For that reason, a credible SaaS does have to filter those.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
Von: Matt Seil <xeno6...@gmail.com> im Auftrag von Matt Seil <ms...@acm.org>
Gesendet: Friday, November 12, 2021 1:11:19 AM
An: Commons Users List <user@commons.apache.org>; P. Ottlinger 
<pottlin...@apache.org>; Gary Gregory <garydgreg...@gmail.com>; Kevin W. Wall 
<kevin.w.w...@gmail.com>
Betreff: Re: [csv] Does the library provide means to circumvent CSV injection


The TLDR version:  OWASP's recommendation is specifically to render code 
intended to be executed as unexecutable.  I'd suggest a fix be done at 
OWASP-Java-Encoder project and not here.  I believe the suggestion of providing 
this feature even at OWASP  has near-zero value in the long run because the 
purpose of formulas in Excel IS to be executed--and Microsoft already offers 
the best speed bump.  Here be dragons!

cc'ing my partner in crime.

============================

I apologize.  This is going to be a TLDR response because I don't know any of 
you professionally so I'm erring on the side of completeness.  Sincere 
apologies if I'm stating things you believe to be obvious, or am myself 
ignorant of something obvious.

So I think there's a misunderstanding in regards to the threat described by the 
OWASP article.  The threat is explicitly FORMULA execution in Excel--and 
LibreOffice.  It sounds similar to a browser problem but its not, its far 
worse. The reason why this particular threat tends to be out of bounds in bug 
bounty programs and in CTF contests is that the attack that exploits this is a 
social engineering attack which always works in the real world.  Hence why bug 
bounties won't pay out for it.

The recommendation from OWASP is as follows:

Encode the offending characters to:

  *   Equals to (=)
  *   Plus (+)
  *   Minus (-)
  *   At (@)
  *   Tab (0x09)
  *   Carriage return (0x0D)
  *   The set [;',"] be similarly escaped

While this would be a mitigation, it would also purposefully break any formulas 
placed into a csv cell.  This is a critical point, and I'll come back to it 
later.   It's all or nothing.

This is where Phil's comment comes in:

"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."

First, let me stress again the risk:  The threat isn't masking cell contents, 
its execution of normal logic in a malicious way.  This is the €1M question:  
"How do we differentiate corrupting values from valid values?"

Asking this csv library to do it means it has to take on quite a bit of 
intelligence.  It doesn't just have to understand what a CSV format is anymore. 
 It has to answer questions like "What's a corrupt equal sign look like?"  And 
it looks like a valid equal sign.  So to do this right, you have to do lexical 
analysis and parsing the same way that Excel is going to do it, and THEN you 
have to infer behavior.

Therefore to determine what corrupt characters look like given data designed to 
be executed you are now in the business of trying to interpret what the excel 
formula is doing, in order to determine whether or not its safe.  This is the 
core problem:  formulas are bits of user-supplied code designed to be executed. 
 If you escape it, you break it.  At best, you annoy the hell out of the 
accountant who was expecting your web app to offer a usable spreadsheet, while 
adding one layer of manual intervention other than the standard warning that MS 
Office provides whenever you open an Excel not created on your machine.

So... what can we do about it?  Microsoft already did it:

[cid:part1.zKavEz9C.SwuG5A47@acm.org]

IMHO there's nothing that any intermediary library can do that's any better 
than this.    Web applications designed to take spreadsheets as input are 
special beasts.  The proper security rule of thumb is to always ensure DATA is 
treated as DATA.  But that rule gets really funky when that DATA is actually 
supposed to be executable code.  But that's your choice:  if you don't want it 
to execute you have to force it to be data, which will break execution by 
programmer intent.

However, I suspect a few of you will be unhappy with my "do nothing" suggestion 
and insist that something ought to be done.

I would recommend writing a CSV encoder for the owasp-java-encoder project.  
https://github.com/OWASP/owasp-java-encoder The framework is already in place 
and its where I push people if they only need encoding functions.

Why I wouldn't do it here:  libraries like this have to be written to the 
lowest-common-denominator, meaning csv format projects that don't have Excel as 
a target.  You want security functions to process as close to the business 
logic as possible, and this is the wrong target for that.  Doing it here means 
not breaking legacy code, which means by default, the option will be off.  (Or 
you follow a deprecation strategy.)  Further--this gets to my original hint 
about threat models--executing formulas in cells is a desired function of Excel 
and its copies.  When developers start breaking spreadsheets they're going to 
revert to legacy behavior meaning you're really talking about improving the 
defensive capability for the security-minded developers that can stand up to 
the finance department.  When OWASP tells you "This attack is difficult to 
mitigate," it isn't just the technical issues involved--which I just 
outlined--its social.  This is why I'm hesitant to offer up "We'll do it in 
ESAPI," because I don't see the value-add in the bigger picture.  Plus, this is 
Microsoft's fault and I'm not thrilled with writing code to speedbump *their* 
problem.  Which, I feel they've addressed as well as they ever will.



On 11/11/2021 4:36 AM, P. Ottlinger wrote:

Hi guys,

thanks for your reply.

Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.

Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.

To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional &quot; .... as we do not need to write
functioning Excel formulas into CSV.

WDYT?

Cheers,
Phil

Am 10.11.21 um 20:53 schrieb Gary Gregory:


I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.

Gary




Reply via email to