On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <br...@apache.org> wrote: > On 11.07.2017 22:50, Stefan Sperling wrote: >> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote: >>> Another issue I have with the proposal is the idea to use file suffixes. >>> That's usually the wrong way to go about things (case in point: Windows >>> does it, with didastrous results). It's much better to determine file >>> format by inspection, such as, e.g., libmagic does. We already have >>> optional support for libmagic in the client (to set svn:mime-type). >> I would not feel comfortable having the server parse arbitrary data with >> libmagic. The libmagic code is not very safe to run on untrusted input. >> I have seen libmagic crash my svn client on several occasions even on >> text files I wrote. >> >> At the client side it's a bit less dangerous because users have already >> told svn to add the files in question to version control, and a libmagic >> exploit running on the client machine can do less harm than a server-side >> one. >> >> Granted, commits are usually authenticated. If we did this we should at >> least make really sure that no unauthenticated access can trigger this code. >> Ideally, it would be sandboxed somehow if we started using it on the server. > > I wasn't really proposing to use libmagic on the server. My point is > that instead of using file name suffixes (which the compression and > deltification code don't know about), we'd do some sort of inspection > instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc., > is fairly easy just from looking at a few bytes of headers. Same goes > for most image and video formats. > > Of course, one could always concoct a file that would trick such > inspection, but at least that's marginally harder to do than commit a > large text file full of spaces and calling it 'spaceinvaders.jpg'. :) > > Random binary data is harder to detect, but we already deal with that > after the fact by using the plain text if the delta is too large.
We could also make the process driven by a "client-side suggestion". Driving it from the client-side also gives us the possibility to eliminate the client-side deltification overhead. I.e. the client has some logic (libmagic, suffix, looking at the first 100 bytes, ...) to determine that it's not worth deltifying and/or compressing. It doesn't do deltification itself, and lets the server know that it probably shouldn't either (or the server sees that the client hasn't deltified, so accepts the content automatically as "non-deltifiable / non-compressable"?). Maybe needs a server-side config setting to make it respect or ignore the client-side suggestion. -- Johan