Tim,

I am using a tool developed by Symplectic Elements that sends Swordv2
packages to our dev site for deposit.  They have a UI interface where you
deposit items, much like the one in dspace.  Once the item is deposited,
you can send it over to our dev instance via swordv2. The item I created at
their site has a little metadata and a docx file.  When the item lands at
our site everything looks good, but the file format is Unknown.  Now, I
don't know how they bundle things up for sword and send it off, and the
dspace log file presently does not tell me much.  I can see in the log file
that it thinks the format is Unknown, but not sure why.  We do have docx in
the fileextensions table, and if I deposit that same docx file using dspace
UI interface, the format is Microsoft Word as expected.

Now, I do know that if I send this command to our site:

curl -i --data-binary "*@/tmp/example.zip*" -H "Content-Disposition:
filename=*jose.zip*" -H "Content-Type: *application/zip*" -H "X-Packaging:
http://purl.org/net/sword-types/METSDSpaceSIP";    -H "X-No-Op: false" -H
"X-Verbose: true"
https://dev.deepblue.lib.umich.edu/swordv2/collection/TEMP-BOGUS/324071 -u
some-u...@umich.edu

it will deposit a file by the name of jose.zip and the format will be zip
file

But if I change the command to:

curl -i --data-binary "@*/tmp/example.zip*" -H "Content-Disposition:
filename=*jose.zip*" -H "Content-Type: *application/pdf*" -H "X-Packaging:
http://purl.org/net/sword-types/METSDSpaceSIP";    -H "X-No-Op: false" -H
"X-Verbose: true"
https://dev.deepblue.lib.umich.edu/swordv2/collection/TEMP-BOGUS/324071 -u
some-u...@umich.edu

It will deposit a file by the name of jose.zip and format will be PDF file,
even though the file is actually a zip file, and when you try to open it
you encounter and error. The -H tells it the Format ( mimetype ).

I don't know how they are sending me the package, but I have asked them.

Thank you!
-Jose


On Wed, Apr 17, 2019 at 12:05 PM Tim Donohue <tdono...@duraspace.org> wrote:

> Hi Jose,
>
> I think we'd need more information on where you are encountering this
> error.  From my understanding with SWORDv2, the expectation is that the
> *package* you deposit is a ZIP file that contains both a metadata file and
> one or more binaries.  An example is at:
> https://github.com/DSpace/DSpace/blob/master/dspace-sword/example/example.zip
> (This same example is for SWORDv1 and v2).
>
> Once SWORDv2 validates the Zip, it should extract the binaries and
> *validate them against your bitstream registry*.  So, as Mark noted, if you
> have a "docx" file in your bitstreamformat registry, then SWORDv2 should
> see that in the same way that any other input mechanism does.
>
> If you are depositing a Word doc directly, it looks like SWORDv2 tries to
> use the "BinaryContentIngester" class, which also looks to be using the
> bitstreamformat registry.
> https://github.com/DSpace/DSpace/blob/master/dspace-swordv2/src/main/java/org/dspace/sword2/BinaryContentIngester.java
>
> It's currently unclear to me how you are trying to deposit to SWORDv2.
> Are you depositing a Zip?  Are you trying to deposit a Word document
> directly?  Where is the error message appearing and what is the exact error?
>
> If you provide a bit more info, perhaps we could narrow down what is going
> on.
>
> Tim
>
> On Wed, Apr 17, 2019 at 9:59 AM Jose Blanco <blan...@umich.edu> wrote:
>
>> This is what it seems like to me too.  Thank you for your input.
>>
>> On Wed, Apr 17, 2019 at 10:40 AM Mark H. Wood <mwoodiu...@gmail.com>
>> wrote:
>> >
>> > On Wed, Apr 17, 2019 at 09:43:51AM -0400, Jose Blanco wrote:
>> > > Mark, Are you sure this is the way it works for swordv2?  I know this
>> > > is the way it works when uploading files via the UI, but I don't think
>> > > this is the way sword does it.  At least I'm not seeing this.  I do
>> > > have docx inthe fileextension table.
>> >
>> > We don't use SWORDv2 here, so I don't know a lot about it, but from
>> > reading some of the code and the specification, it seems that the
>> > packaging declares the type of a bitstream?  So is the client telling
>> > the server that this file is an unknown type?
>> >
>> > I think you need a SWORD expert here, and I am not one of those.
>> >
>> > > On Wed, Apr 17, 2019 at 9:22 AM Mark H. Wood <mwoodiu...@gmail.com>
>> wrote:
>> > > >
>> > > > On Tue, Apr 16, 2019 at 04:28:32PM -0400, Jose Blanco wrote:
>> > > > > I am doing a deposit of a docx file using swordv2 and I'm getting
>> a
>> > > > > format of Unknown.  I'm trying to track down how this
>> determination
>> > > > > was made.  I would expect the format to be based on the mime type
>> of
>> > > > > the file, which is :
>> > > > > > file --mime-type -b This\ is\ a\ docx\ file\ for\ test.docx
>> > > > >
>> application/vnd.openxmlformats-officedocument.wordprocessingml.document
>> > > >
>> > > > Unfortunately, DSpace currently uses the filename extension
>> > > > (e.g. ".docx") rather than inspecting the file for magic numbers, to
>> > > > determine the type of the file.
>> > > >
>> > > > > And according to the db
>> > > > >
>> > > > >  select * from bitstreamformatregistry where mimetype
>> > > > >
>> ='application/vnd.openxmlformats-officedocument.wordprocessingml.document';
>> > > > >
>> > > > > should be "Microsoft Word XML"
>> > > > >
>> > > > > What am I not understanding?
>> > > >
>> > > > There should also be a row in the "fileextension" table with the
>> > > > "extension" 'docx' and the "bitstream_format_id" matching that
>> column
>> > > > value in the "bitstreamformatregistry" table.  If the row is
>> missing,
>> > > > or doesn't match, then the file would be diagnosed as an unknown
>> type.
>> > > >
>> > > >   SELECT * FROM bitstreamformatregistry WHERE bitstream_format_id =
>> > > >   (SELECT bitstream_format_id FROM fileextension WHERE extension =
>> > > >   'docx');
>> > > >
>> > > > would test that.
>> >
>> > --
>> > Mark H. Wood
>> > Lead Technology Analyst
>> >
>> > University Library
>> > Indiana University - Purdue University Indianapolis
>> > 755 W. Michigan Street
>> > Indianapolis, IN 46202
>> > 317-274-0749
>> > www.ulib.iupui.edu
>> >
>> > --
>> > All messages to this mailing list should adhere to the DuraSpace Code
>> of Conduct: https://duraspace.org/about/policies/code-of-conduct/
>> > ---
>> > You received this message because you are subscribed to the Google
>> Groups "DSpace Technical Support" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to dspace-tech+unsubscr...@googlegroups.com.
>> > To post to this group, send email to dspace-tech@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/dspace-tech.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> All messages to this mailing list should adhere to the DuraSpace Code of
>> Conduct: https://duraspace.org/about/policies/code-of-conduct/
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dspace-tech+unsubscr...@googlegroups.com.
>> To post to this group, send email to dspace-tech@googlegroups.com.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> Tim Donohue
> Technical Lead for DSpace & DSpaceDirect
> DuraSpace.org | DSpace.org | DSpaceDirect.org
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to