Re: [csv] Does the library provide means to circumvent CSV injection

Kevin W. Wall Fri, 12 Nov 2021 02:56:39 -0800

Comments from Matt's partner in crime inline, below. -kevin
[This probably will bounce on the Commons Users List since I'm not
subscribed to that mailing list, but if you want it there, any of you who
is subscribed have my permission to forward it.]


On Thu, Nov 11, 2021 at 7:11 PM Matt Seil <ms...@acm.org> wrote:

> The TLDR version:  OWASP's recommendation is specifically to render code
> intended to be executed as unexecutable.  I'd suggest a fix be done at
> OWASP-Java-Encoder project and not here.  I believe the suggestion of
> providing this feature even at OWASP  has near-zero value in the long run
> because the purpose of formulas in Excel IS to be executed--and Microsoft
> already offers the best speed bump.  Here be dragons!
>
> cc'ing my partner in crime.
>
> ============================
>
> I apologize.  This is going to be a TLDR response because I don't know any
> of you professionally so I'm erring on the side of completeness.  Sincere
> apologies if I'm stating things you believe to be obvious, or am myself
> ignorant of something obvious.
>
> So I think there's a misunderstanding in regards to the threat described
> by the OWASP article.  The threat is explicitly *FORMULA *execution in
> Excel--and LibreOffice.  It sounds similar to a browser problem but its
> not, its far worse. The reason why this particular threat tends to be out
> of bounds in bug bounty programs and in CTF contests is that the attack
> that exploits this is a social engineering attack which always works in the
> real world.  Hence why bug bounties won't pay out for it.
>
> The recommendation from OWASP is as follows:
>
> Encode the offending characters to:
>
>    - Equals to (=)
>    - Plus (+)
>    - Minus (-)
>    - At (@)
>    - Tab (0x09)
>    - Carriage return (0x0D)
>    - The set [;',"] be similarly escaped
>
> While this would be a mitigation, it would also *purposefully break** any
> formulas* placed into a csv cell.  This is a critical point, and I'll
> come back to it later.   It's all or nothing.
>
> This is where Phil's comment comes in:
>
> "Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell."
>
> First, let me stress again the risk:  The threat isn't masking cell
> contents, its *execution *of normal logic in a malicious way.  This is
> the €1M question:  "How do we differentiate corrupting values from valid
> values?"
>

I think it is even more specific than that even. The problem is the
application used to read the CSV file and render it as a table /
spreadsheet. While I've see CSV files with something like:

=cmd|’ /C calc’!A0

pop calc.exe in demos on Windows with Excel, I'm not sure the same thing
(with an appropriately renamed calculator application) would launch in
(say) MacOS with their 'Numbers.app' or in Linux with LibreOffice Calc. To
me, the main level of responsibility of preventing this should be with
whatever application handles the CSV file, not the application that accepts
it to be uploaded. To put it a different way, if you allowed general
untrusted users to upload .html files to some web application (which we all
agree would be dumb) I don't think we'd expect that the web application
that allows that should be responsible for trying to parse the HTML file to
make sure that all the links were safe and that there was no malicious
JavaScript contained in it. [Note: BTW, we *do* have a very similar problem
with general PDFs being uploaded, but there most of those PDF readers
(looking at you, Adobe) have finally learned it's bad for sales if you get
a shitload of bad PR because Acrobat or Acrobat reader automatically
enables the interpretation of JavaScript for the convenience of its users.]
Or if they can upload image files, do we have to write parsers that
correctly verify malformed GIF, TIFF, JPEG, or PNG images just because one
popular library might have a buffer overflow in it or a use after free
issue? (Nah. That would never happen.)

Clearly, if you are going insist on allowing potentially malicious
artifacts to be uploaded, you want to deploy a broader no code / low code
solution, so you drop the file into the file system (before you potentially
store in in you DB) and rely on malware detection scanning to spot and
quarantine the ones with the evil bits.

>
> Asking this csv library to do it means it has to take on quite a bit of
> intelligence.  It doesn't just have to understand what a CSV format is
> anymore.  It has to answer questions like "*What's a corrupt equal sign
> look like?*"  And it looks like a valid equal sign.  So to do this right,
> you have to do lexical analysis and parsing the same way that Excel is
> going to do it, and THEN you have to infer behavior.
>
I should not be able to *create* evil bits using whatever CSV library
(Apache Commons CSV or any other) that you are referring to here, but I
agree that it should not have to filter out the evil bits itself. What I
would do is allow the user of said library to register their own method /
function that the library's parser uses as a call back to validate the
input of each individual parsed field. That validation function could be as
simple as just returning a boolean or it could be more complicated and
return an array of the fields (by position) that failed the validation. But
it then pushes the responsibility into the own developers hands who should
have some dreaming idea of what they want to do with it and whether on not
the expect things like popping up a calculator to be allowed or not.

>
> Therefore to determine what corrupt characters look like given data
> designed to be executed you are now in the business of trying to interpret
> what the excel formula is doing, in order to determine whether or not its
> safe.
>
Yeah, you cannot automatically make that determination because you don't
have additional context. But if you provide a mechanism for a callback
function, you put that responsibility back in the hands of someone who
might be able to decide that. (This would also allow them to write a FCC
spreadsheet filter for George Carlin's 7 Dirty Words! :)

> This is the core problem:  formulas are bits of *user-supplied* *code 
> **designed
> to be executed*.  If you escape it, you break it.  At best, you annoy the
> hell out of the accountant who was expecting your web app to offer a usable
> spreadsheet, while adding one layer of manual intervention other than the
> standard warning that MS Office provides whenever you open an Excel not
> created on your machine.
>
> So... what can we do about it?  Microsoft already did it:
>
> IMHO there's nothing that any intermediary library can do that's any
> better than this.    Web applications designed to take spreadsheets as
> input are special beasts.  The proper security rule of thumb is to always
> ensure DATA is treated as DATA.  But that rule gets *really funky* when
> that DATA is actually supposed to be executable code.  But that's your
> choice:  if you don't want it to execute you have to force it to be data,
> which will break execution by programmer intent.
>
> However, I suspect a few of you will be unhappy with my "do nothing"
> suggestion and insist that something ought to be done.
>
> I would recommend writing a CSV encoder for the owasp-java-encoder
> project.  https://github.com/OWASP/owasp-java-encoder The framework is
> already in place and its where I push people if they only need encoding
> functions.
>
I seem to recall someone recently asking Jim Manico about that in a GitHub
issue or discussion. That memory is fuzzy, but IIRC, I think he resisted
the idea. If someone actually implemented it and appropriate tests and made
a PR to support it though, they might make it happen. (But keep in mind
that if someone is going to try that, it has to work with Java 5!!!)


> Why I wouldn't do it here:  libraries like this have to be written to the
> lowest-common-denominator, meaning csv format projects that don't have
> Excel as a target.
>
Ding, ding, ding!  Give that man an HTTP cookie. Excel isn't the only
player on the block. They weren't even the first. (They're just the biggest
and maybe most obnoxious, but I digress.)

> You want security functions to process as close to the business logic as
> possible, and this is the wrong target for that.  Doing it here means not
> breaking legacy code, which means by default, the option will be off.  (Or
> you follow a deprecation strategy.)
>
And it's worse than it is for web browsers, which are (supposed to be)
losely standards based, based on W3C and IETF standards. It's also worse in
other ways, because people key spreadsheets for 20+ years and still expect
them to work. (Guilty!)  AFAIK, they're ain't no spreadsheet
interoperability standards committee, so it will likely only change after a
massive public breach.

> Further--this gets to my original hint about threat models--executing
> formulas in cells is a *desired function* of Excel and its copies.  When
> developers start breaking spreadsheets they're going to revert to legacy
> behavior meaning you're really talking about improving the defensive
> capability for the security-minded developers that can stand up to the
> finance department.  When OWASP tells you "This attack is difficult to
> mitigate," it isn't just the technical issues involved--which I just
> outlined--its social.  This is why I'm hesitant to offer up "We'll do it in
> ESAPI," because I don't see the value-add in the bigger picture.  Plus, *this
> is Microsoft's fault* and I'm not thrilled with writing code to speedbump
> *their* problem.  Which, I feel they've addressed as well as they ever
> will.
>

Agree. We will NOT do it in ESAPI. Not the output encoding, at least; it's
just totally wrong for that. If someone wants to write a PR to add it to
the (IMO) already bloated Validator, we *might* be able to accommodate it,
but it just depends on how complicated the code is and how soon it's needed
and how good the test suite for it is.
-kevin

>
>
> On 11/11/2021 4:36 AM, P. Ottlinger wrote:
>
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?
>
> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
>
> I agree with Matt. CSV is just a container, it doesn't know or care what
> the concept of a "formula" is.
>
> Gary
>
>
-kevin
-- 
Blog: https://off-the-wall-security.blogspot.com/    | Twitter: @KevinWWall
| OWASP ESAPI Project co-lead
NSA: All your crypto bit are belong to us.

Re: [csv] Does the library provide means to circumvent CSV injection

Reply via email to