Btw Microsoft’s macro execution prevention does not protect against formulas in CSV. It are different pop-ups (at least in office 2016) which will be shown in the formula injection case, especially when using the cmd| mechanism, but it’s rather easy to allow it (if administrators don’t prevent it with a group policy). It is a bit unfortunate as it tells you, you need to trust the souce, who would distrust their ERP or their bank… Not to mention that Google sheets is also affected. For that reason, a credible SaaS does have to filter those.
Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: Matt Seil <xeno6...@gmail.com> im Auftrag von Matt Seil <ms...@acm.org> Gesendet: Friday, November 12, 2021 1:11:19 AM An: Commons Users List <user@commons.apache.org>; P. Ottlinger <pottlin...@apache.org>; Gary Gregory <garydgreg...@gmail.com>; Kevin W. Wall <kevin.w.w...@gmail.com> Betreff: Re: [csv] Does the library provide means to circumvent CSV injection The TLDR version: OWASP's recommendation is specifically to render code intended to be executed as unexecutable. I'd suggest a fix be done at OWASP-Java-Encoder project and not here. I believe the suggestion of providing this feature even at OWASP has near-zero value in the long run because the purpose of formulas in Excel IS to be executed--and Microsoft already offers the best speed bump. Here be dragons! cc'ing my partner in crime. ============================ I apologize. This is going to be a TLDR response because I don't know any of you professionally so I'm erring on the side of completeness. Sincere apologies if I'm stating things you believe to be obvious, or am myself ignorant of something obvious. So I think there's a misunderstanding in regards to the threat described by the OWASP article. The threat is explicitly FORMULA execution in Excel--and LibreOffice. It sounds similar to a browser problem but its not, its far worse. The reason why this particular threat tends to be out of bounds in bug bounty programs and in CTF contests is that the attack that exploits this is a social engineering attack which always works in the real world. Hence why bug bounties won't pay out for it. The recommendation from OWASP is as follows: Encode the offending characters to: * Equals to (=) * Plus (+) * Minus (-) * At (@) * Tab (0x09) * Carriage return (0x0D) * The set [;',"] be similarly escaped While this would be a mitigation, it would also purposefully break any formulas placed into a csv cell. This is a critical point, and I'll come back to it later. It's all or nothing. This is where Phil's comment comes in: "Maybe I'm misinterpreting something but I thought that it could be made possible to configure CSVFormat-object when writing the CSV data in a way that any data with possibly corrupting values (as shown on the OWASP page) will mask the whole contents of the cell." First, let me stress again the risk: The threat isn't masking cell contents, its execution of normal logic in a malicious way. This is the €1M question: "How do we differentiate corrupting values from valid values?" Asking this csv library to do it means it has to take on quite a bit of intelligence. It doesn't just have to understand what a CSV format is anymore. It has to answer questions like "What's a corrupt equal sign look like?" And it looks like a valid equal sign. So to do this right, you have to do lexical analysis and parsing the same way that Excel is going to do it, and THEN you have to infer behavior. Therefore to determine what corrupt characters look like given data designed to be executed you are now in the business of trying to interpret what the excel formula is doing, in order to determine whether or not its safe. This is the core problem: formulas are bits of user-supplied code designed to be executed. If you escape it, you break it. At best, you annoy the hell out of the accountant who was expecting your web app to offer a usable spreadsheet, while adding one layer of manual intervention other than the standard warning that MS Office provides whenever you open an Excel not created on your machine. So... what can we do about it? Microsoft already did it: [cid:part1.zKavEz9C.SwuG5A47@acm.org] IMHO there's nothing that any intermediary library can do that's any better than this. Web applications designed to take spreadsheets as input are special beasts. The proper security rule of thumb is to always ensure DATA is treated as DATA. But that rule gets really funky when that DATA is actually supposed to be executable code. But that's your choice: if you don't want it to execute you have to force it to be data, which will break execution by programmer intent. However, I suspect a few of you will be unhappy with my "do nothing" suggestion and insist that something ought to be done. I would recommend writing a CSV encoder for the owasp-java-encoder project. https://github.com/OWASP/owasp-java-encoder The framework is already in place and its where I push people if they only need encoding functions. Why I wouldn't do it here: libraries like this have to be written to the lowest-common-denominator, meaning csv format projects that don't have Excel as a target. You want security functions to process as close to the business logic as possible, and this is the wrong target for that. Doing it here means not breaking legacy code, which means by default, the option will be off. (Or you follow a deprecation strategy.) Further--this gets to my original hint about threat models--executing formulas in cells is a desired function of Excel and its copies. When developers start breaking spreadsheets they're going to revert to legacy behavior meaning you're really talking about improving the defensive capability for the security-minded developers that can stand up to the finance department. When OWASP tells you "This attack is difficult to mitigate," it isn't just the technical issues involved--which I just outlined--its social. This is why I'm hesitant to offer up "We'll do it in ESAPI," because I don't see the value-add in the bigger picture. Plus, this is Microsoft's fault and I'm not thrilled with writing code to speedbump *their* problem. Which, I feel they've addressed as well as they ever will. On 11/11/2021 4:36 AM, P. Ottlinger wrote: Hi guys, thanks for your reply. Maybe I'm misinterpreting something but I thought that it could be made possible to configure CSVFormat-object when writing the CSV data in a way that any data with possibly corrupting values (as shown on the OWASP page) will mask the whole contents of the cell. Thus a library such as commons-csv would be able to lower the risk for CSV injection and not every client/customer would have to manually create this protecting logic. To my mind it's a simple parser for "dangerous" tokens that quotes the given data with additional " .... as we do not need to write functioning Excel formulas into CSV. WDYT? Cheers, Phil Am 10.11.21 um 20:53 schrieb Gary Gregory: I agree with Matt. CSV is just a container, it doesn't know or care what the concept of a "formula" is. Gary