[
https://issues.apache.org/jira/browse/TIKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean Coudon updated TIKA-1928:
------------------------------
Comment: was deleted
(was: I am using Linux Mint 17.1. I couldn't manage to reproduce it with the
CLI App, but the CLI App might not use the same detection method as it requires
a file when my test is actually run with a null stream.
Yes my code is extracted from a JUnit test I built to try this out, here is the
full version:
{code:java}
@Test
public void testPoundInFileName() throws IOException {
org.apache.tika.metadata.Metadata metadata = new
org.apache.tika.metadata.Metadata();
Tika tika = new Tika();
metadata.add(org.apache.tika.metadata.Metadata.RESOURCE_NAME_KEY,
"test#.pdf");
// tika uses NameDetector if first parameter == null
assertEquals("application/pdf", tika.detect(null, metadata));
}
{code}
)
> Filename detection misses when a # is in a filename
> ---------------------------------------------------
>
> Key: TIKA-1928
> URL: https://issues.apache.org/jira/browse/TIKA-1928
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.12
> Environment: java 8
> Reporter: Jean Coudon
> Priority: Minor
>
> If there is a pound character in a filename it will be detected as
> application/octet-stream instead of the proper type that is detected without
> the filename containing the pound.
> {code:java}
> Metadata metadata = new Metadata();
> Tika tika = new Tika();
> metadata.add(Metadata.RESOURCE_NAME_KEY, "test#.pdf");
> // tika uses NameDetector if first parameter == null
> System.out.println(tika.detect(null, metadata));
> // prints application/octet-stream instead of application/pdf
> {code}
> Tested for application/pdf and application/xml.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)