Jean Coudon created TIKA-1928:
---------------------------------

             Summary: Filename detection misses when a # is in a filename
                 Key: TIKA-1928
                 URL: https://issues.apache.org/jira/browse/TIKA-1928
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 1.12
         Environment: java 8
            Reporter: Jean Coudon
            Priority: Minor


If there is a pound character in a filename it will be detected as 
application/octet-stream instead of the proper type that is detected without 
the filename containing the pound.
```
Metadata metadata = new Metadata();
Tika tika = new Tika();
metadata.add(Metadata.RESOURCE_NAME_KEY, "test#.pdf");
// tika uses NameDetector if first parameter == null
System.out.println(tika.detect(null, metadata));
// printes application/octet-stream instead of application/pdf
```

Tested for application/pdf and application/xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to