Excel 5 files are inconsistently detected as either "application/msword" or 
"application/vnd.ms-excel"
------------------------------------------------------------------------------------------------------

                 Key: TIKA-516
                 URL: https://issues.apache.org/jira/browse/TIKA-516
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
            Reporter: Victor Kazakov
            Priority: Minor
         Attachments: excel5.xls

Using the AutoDetectParser on an Excel 5 file inconsistently detects it as 
either "application/msword" or "application/vnd.ms-excel"

See the following code:

        public static void main(String[] args) throws Exception {
                FileInputStream stream = null;
                try {
                        for (int i = 0; i < 10; i++) {
                                File file = new File("excel5.xls");
                                stream = new FileInputStream(file);
                                AutoDetectParser parser = new 
AutoDetectParser();
                                Metadata metadata = new Metadata();
                                metadata.set(Metadata.RESOURCE_NAME_KEY, 
file.getName());
                                parser.parse(stream, new DefaultHandler(), 
metadata);
                                
System.out.println(metadata.get(Metadata.CONTENT_TYPE));
                        }
                } finally {
                        if (stream != null) {
                                stream.close();
                        }
                }
        }

an example output is: 
application/vnd.ms-excel
application/msword
application/msword
application/vnd.ms-excel
application/vnd.ms-excel
application/vnd.ms-excel
application/vnd.ms-excel
application/msword
application/vnd.ms-excel
application/msword

The excel 5 file I used is attached to this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to