On Tue, Apr 16, 2019 at 04:28:32PM -0400, Jose Blanco wrote:
> I am doing a deposit of a docx file using swordv2 and I'm getting a
> format of Unknown.  I'm trying to track down how this determination
> was made.  I would expect the format to be based on the mime type of
> the file, which is :
> > file --mime-type -b This\ is\ a\ docx\ file\ for\ test.docx
> application/vnd.openxmlformats-officedocument.wordprocessingml.document

Unfortunately, DSpace currently uses the filename extension
(e.g. ".docx") rather than inspecting the file for magic numbers, to
determine the type of the file.

> And according to the db
> 
>  select * from bitstreamformatregistry where mimetype
> ='application/vnd.openxmlformats-officedocument.wordprocessingml.document';
> 
> should be "Microsoft Word XML"
> 
> What am I not understanding?

There should also be a row in the "fileextension" table with the
"extension" 'docx' and the "bitstream_format_id" matching that column
value in the "bitstreamformatregistry" table.  If the row is missing,
or doesn't match, then the file would be diagnosed as an unknown type.

  SELECT * FROM bitstreamformatregistry WHERE bitstream_format_id =
  (SELECT bitstream_format_id FROM fileextension WHERE extension =
  'docx');

would test that.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: PGP signature

Reply via email to