Jean Coudon created TIKA-1928:
---------------------------------
Summary: Filename detection misses when a # is in a filename
Key: TIKA-1928
URL: https://issues.apache.org/jira/browse/TIKA-1928
Project: Tika
Issue Type: Bug
Components: detector
Affects Versions: 1.12
Environment: java 8
Reporter: Jean Coudon
Priority: Minor
If there is a pound character in a filename it will be detected as
application/octet-stream instead of the proper type that is detected without
the filename containing the pound.
```
Metadata metadata = new Metadata();
Tika tika = new Tika();
metadata.add(Metadata.RESOURCE_NAME_KEY, "test#.pdf");
// tika uses NameDetector if first parameter == null
System.out.println(tika.detect(null, metadata));
// printes application/octet-stream instead of application/pdf
```
Tested for application/pdf and application/xml.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)