Hey guys,
we recently contributed some security improvements to the popular JSF
component library PrimeFaces to validate specified accepted content
types of uploaded files at server side. Therefore we make use of Java's
Files.probeContentType which automatically picks up registered
java.nio.file.spi.FileTypeDetector service implementations. The default
implementations however primarily check the file extension only by doing
registry lookups or something like that. That's insufficient from a
security point of view. That's why we will recommend to have Apache Tika
in the classpath, more strictly speaking the tika-java7 dependency.
Now there are two questions regarding the use of TikaFileTypeDetector
and the required dependencies:
1. Transitive dependencies of tika-java7 are really big (more than 50
megabyte). I know that Apache Tika is not just about file type detection
but very much more like meta data extraction that we don't need at all.
Would it be okay to exclude tika-parsers which makes up the biggest
module without losing file type detection abilities?
2. Unfortunately, TikaFileTypeDetector defaults to perform most
efficiently by having included a short circuit if the content type can
be guessed from the file extension. This is however insecure since we
want to protect our users from tampered file uploads. We always want to
use deep (and expensive) content type analysis by looking at the magic
bytes or something like that. We currently work around this limitation
by explicitly putting .tmp as the file name's extension to have Tika
detected application/octet-stream and force it to go ahead. But that's
some kind of white box knowledge to rely on which is not that good. Can
you somehow provide different implementations of
java.nio.file.spi.FileTypeDetector, say "eager" and "lazy"? Please note
that we are not allowed to introduce required dependencies, i.e. using
Tika directly is not an option.
Here are the related issues:
https://github.com/primefaces/primefaces/issues/2791 and
https://github.com/primefaces/primefaces/issues/4244
And the pull requests already merged into PrimeFaces 6.3:
https://github.com/primefaces/primefaces/pull/4242 and
https://github.com/primefaces/primefaces/pull/4249
Thanks for your response in advance
Kind regards, cnsgithub