Hi Jose,

I think we'd need more information on where you are encountering this
error.  From my understanding with SWORDv2, the expectation is that the
*package* you deposit is a ZIP file that contains both a metadata file and
one or more binaries.  An example is at:
https://github.com/DSpace/DSpace/blob/master/dspace-sword/example/example.zip
(This same example is for SWORDv1 and v2).

Once SWORDv2 validates the Zip, it should extract the binaries and
*validate them against your bitstream registry*.  So, as Mark noted, if you
have a "docx" file in your bitstreamformat registry, then SWORDv2 should
see that in the same way that any other input mechanism does.

If you are depositing a Word doc directly, it looks like SWORDv2 tries to
use the "BinaryContentIngester" class, which also looks to be using the
bitstreamformat registry.
https://github.com/DSpace/DSpace/blob/master/dspace-swordv2/src/main/java/org/dspace/sword2/BinaryContentIngester.java

It's currently unclear to me how you are trying to deposit to SWORDv2.  Are
you depositing a Zip?  Are you trying to deposit a Word document directly?
Where is the error message appearing and what is the exact error?

If you provide a bit more info, perhaps we could narrow down what is going
on.

Tim

On Wed, Apr 17, 2019 at 9:59 AM Jose Blanco <blan...@umich.edu> wrote:

> This is what it seems like to me too.  Thank you for your input.
>
> On Wed, Apr 17, 2019 at 10:40 AM Mark H. Wood <mwoodiu...@gmail.com>
> wrote:
> >
> > On Wed, Apr 17, 2019 at 09:43:51AM -0400, Jose Blanco wrote:
> > > Mark, Are you sure this is the way it works for swordv2?  I know this
> > > is the way it works when uploading files via the UI, but I don't think
> > > this is the way sword does it.  At least I'm not seeing this.  I do
> > > have docx inthe fileextension table.
> >
> > We don't use SWORDv2 here, so I don't know a lot about it, but from
> > reading some of the code and the specification, it seems that the
> > packaging declares the type of a bitstream?  So is the client telling
> > the server that this file is an unknown type?
> >
> > I think you need a SWORD expert here, and I am not one of those.
> >
> > > On Wed, Apr 17, 2019 at 9:22 AM Mark H. Wood <mwoodiu...@gmail.com>
> wrote:
> > > >
> > > > On Tue, Apr 16, 2019 at 04:28:32PM -0400, Jose Blanco wrote:
> > > > > I am doing a deposit of a docx file using swordv2 and I'm getting a
> > > > > format of Unknown.  I'm trying to track down how this determination
> > > > > was made.  I would expect the format to be based on the mime type
> of
> > > > > the file, which is :
> > > > > > file --mime-type -b This\ is\ a\ docx\ file\ for\ test.docx
> > > > >
> application/vnd.openxmlformats-officedocument.wordprocessingml.document
> > > >
> > > > Unfortunately, DSpace currently uses the filename extension
> > > > (e.g. ".docx") rather than inspecting the file for magic numbers, to
> > > > determine the type of the file.
> > > >
> > > > > And according to the db
> > > > >
> > > > >  select * from bitstreamformatregistry where mimetype
> > > > >
> ='application/vnd.openxmlformats-officedocument.wordprocessingml.document';
> > > > >
> > > > > should be "Microsoft Word XML"
> > > > >
> > > > > What am I not understanding?
> > > >
> > > > There should also be a row in the "fileextension" table with the
> > > > "extension" 'docx' and the "bitstream_format_id" matching that column
> > > > value in the "bitstreamformatregistry" table.  If the row is missing,
> > > > or doesn't match, then the file would be diagnosed as an unknown
> type.
> > > >
> > > >   SELECT * FROM bitstreamformatregistry WHERE bitstream_format_id =
> > > >   (SELECT bitstream_format_id FROM fileextension WHERE extension =
> > > >   'docx');
> > > >
> > > > would test that.
> >
> > --
> > Mark H. Wood
> > Lead Technology Analyst
> >
> > University Library
> > Indiana University - Purdue University Indianapolis
> > 755 W. Michigan Street
> > Indianapolis, IN 46202
> > 317-274-0749
> > www.ulib.iupui.edu
> >
> > --
> > All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> > ---
> > You received this message because you are subscribed to the Google
> Groups "DSpace Technical Support" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to dspace-tech+unsubscr...@googlegroups.com.
> > To post to this group, send email to dspace-tech@googlegroups.com.
> > Visit this group at https://groups.google.com/group/dspace-tech.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to