Excel 5 files are inconsistently detected as either "application/msword" or
"application/vnd.ms-excel"
------------------------------------------------------------------------------------------------------
Key: TIKA-516
URL: https://issues.apache.org/jira/browse/TIKA-516
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.7
Reporter: Victor Kazakov
Priority: Minor
Attachments: excel5.xls
Using the AutoDetectParser on an Excel 5 file inconsistently detects it as
either "application/msword" or "application/vnd.ms-excel"
See the following code:
public static void main(String[] args) throws Exception {
FileInputStream stream = null;
try {
for (int i = 0; i < 10; i++) {
File file = new File("excel5.xls");
stream = new FileInputStream(file);
AutoDetectParser parser = new
AutoDetectParser();
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY,
file.getName());
parser.parse(stream, new DefaultHandler(),
metadata);
System.out.println(metadata.get(Metadata.CONTENT_TYPE));
}
} finally {
if (stream != null) {
stream.close();
}
}
}
an example output is:
application/vnd.ms-excel
application/msword
application/msword
application/vnd.ms-excel
application/vnd.ms-excel
application/vnd.ms-excel
application/vnd.ms-excel
application/msword
application/vnd.ms-excel
application/msword
The excel 5 file I used is attached to this bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.