VTechWorks has Elements with Repository Tools 2, which uses Swordv2. We 
have not used the Swordv2 deposit tool that you describe. A typical 
Elements deposit, http://hdl.handle.net/10919/88983, yields a PDF with 
Format="Adobe PDF" in the Original bundle and an XML file with 
Format="Unknown" in the SWORD bundle.

When I deposit a test SWORD zip package from the command line, I use the 
first command that you list, -H "Content-Type: *application/zip*".

-Anne

On Wednesday, April 17, 2019 at 2:32:09 PM UTC-4, Jose Blanco wrote:
>
> Tim,
>
> I am using a tool developed by Symplectic Elements that sends Swordv2 
> packages to our dev site for deposit.  They have a UI interface where you 
> deposit items, much like the one in dspace.  Once the item is deposited, 
> you can send it over to our dev instance via swordv2. The item I created at 
> their site has a little metadata and a docx file.  When the item lands at 
> our site everything looks good, but the file format is Unknown.  Now, I 
> don't know how they bundle things up for sword and send it off, and the 
> dspace log file presently does not tell me much.  I can see in the log file 
> that it thinks the format is Unknown, but not sure why.  We do have docx in 
> the fileextensions table, and if I deposit that same docx file using dspace 
> UI interface, the format is Microsoft Word as expected.  
>
> Now, I do know that if I send this command to our site:
>
> curl -i --data-binary "*@/tmp/example.zip*" -H "Content-Disposition: 
> filename=*jose.zip*" -H "Content-Type: *application/zip*" -H 
> "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP";    -H 
> "X-No-Op: false" -H "X-Verbose: true" 
> https://dev.deepblue.lib.umich.edu/swordv2/collection/TEMP-BOGUS/324071 
> -u some...@umich.edu <javascript:>
>
> it will deposit a file by the name of jose.zip and the format will be zip 
> file
>
> But if I change the command to:
>
> curl -i --data-binary "@*/tmp/example.zip*" -H "Content-Disposition: 
> filename=*jose.zip*" -H "Content-Type: *application/pdf*" -H 
> "X-Packaging: http://purl.org/net/sword-types/METSDSpaceSIP";    -H 
> "X-No-Op: false" -H "X-Verbose: true" 
> https://dev.deepblue.lib.umich.edu/swordv2/collection/TEMP-BOGUS/324071 
> -u some...@umich.edu <javascript:>
>
> It will deposit a file by the name of jose.zip and format will be PDF 
> file, even though the file is actually a zip file, and when you try to open 
> it you encounter and error. The -H tells it the Format ( mimetype ).
>
> I don't know how they are sending me the package, but I have asked them.
>
> Thank you!
> -Jose
>
>
> On Wed, Apr 17, 2019 at 12:05 PM Tim Donohue <tdon...@duraspace.org 
> <javascript:>> wrote:
>
>> Hi Jose,
>>
>> I think we'd need more information on where you are encountering this 
>> error.  From my understanding with SWORDv2, the expectation is that the 
>> *package* you deposit is a ZIP file that contains both a metadata file and 
>> one or more binaries.  An example is at:  
>> https://github.com/DSpace/DSpace/blob/master/dspace-sword/example/example.zip
>>   
>> (This same example is for SWORDv1 and v2).
>>
>> Once SWORDv2 validates the Zip, it should extract the binaries and 
>> *validate them against your bitstream registry*.  So, as Mark noted, if you 
>> have a "docx" file in your bitstreamformat registry, then SWORDv2 should 
>> see that in the same way that any other input mechanism does.
>>
>> If you are depositing a Word doc directly, it looks like SWORDv2 tries to 
>> use the "BinaryContentIngester" class, which also looks to be using the 
>> bitstreamformat registry.   
>> https://github.com/DSpace/DSpace/blob/master/dspace-swordv2/src/main/java/org/dspace/sword2/BinaryContentIngester.java
>>
>> It's currently unclear to me how you are trying to deposit to SWORDv2.  
>> Are you depositing a Zip?  Are you trying to deposit a Word document 
>> directly?  Where is the error message appearing and what is the exact error?
>>
>> If you provide a bit more info, perhaps we could narrow down what is 
>> going on.
>>
>> Tim
>>
>> On Wed, Apr 17, 2019 at 9:59 AM Jose Blanco <bla...@umich.edu 
>> <javascript:>> wrote:
>>
>>> This is what it seems like to me too.  Thank you for your input.
>>>
>>> On Wed, Apr 17, 2019 at 10:40 AM Mark H. Wood <mwood...@gmail.com 
>>> <javascript:>> wrote:
>>> >
>>> > On Wed, Apr 17, 2019 at 09:43:51AM -0400, Jose Blanco wrote:
>>> > > Mark, Are you sure this is the way it works for swordv2?  I know this
>>> > > is the way it works when uploading files via the UI, but I don't 
>>> think
>>> > > this is the way sword does it.  At least I'm not seeing this.  I do
>>> > > have docx inthe fileextension table.
>>> >
>>> > We don't use SWORDv2 here, so I don't know a lot about it, but from
>>> > reading some of the code and the specification, it seems that the
>>> > packaging declares the type of a bitstream?  So is the client telling
>>> > the server that this file is an unknown type?
>>> >
>>> > I think you need a SWORD expert here, and I am not one of those.
>>> >
>>> > > On Wed, Apr 17, 2019 at 9:22 AM Mark H. Wood <mwood...@gmail.com 
>>> <javascript:>> wrote:
>>> > > >
>>> > > > On Tue, Apr 16, 2019 at 04:28:32PM -0400, Jose Blanco wrote:
>>> > > > > I am doing a deposit of a docx file using swordv2 and I'm 
>>> getting a
>>> > > > > format of Unknown.  I'm trying to track down how this 
>>> determination
>>> > > > > was made.  I would expect the format to be based on the mime 
>>> type of
>>> > > > > the file, which is :
>>> > > > > > file --mime-type -b This\ is\ a\ docx\ file\ for\ test.docx
>>> > > > > 
>>> application/vnd.openxmlformats-officedocument.wordprocessingml.document
>>> > > >
>>> > > > Unfortunately, DSpace currently uses the filename extension
>>> > > > (e.g. ".docx") rather than inspecting the file for magic numbers, 
>>> to
>>> > > > determine the type of the file.
>>> > > >
>>> > > > > And according to the db
>>> > > > >
>>> > > > >  select * from bitstreamformatregistry where mimetype
>>> > > > > 
>>> ='application/vnd.openxmlformats-officedocument.wordprocessingml.document';
>>> > > > >
>>> > > > > should be "Microsoft Word XML"
>>> > > > >
>>> > > > > What am I not understanding?
>>> > > >
>>> > > > There should also be a row in the "fileextension" table with the
>>> > > > "extension" 'docx' and the "bitstream_format_id" matching that 
>>> column
>>> > > > value in the "bitstreamformatregistry" table.  If the row is 
>>> missing,
>>> > > > or doesn't match, then the file would be diagnosed as an unknown 
>>> type.
>>> > > >
>>> > > >   SELECT * FROM bitstreamformatregistry WHERE bitstream_format_id =
>>> > > >   (SELECT bitstream_format_id FROM fileextension WHERE extension =
>>> > > >   'docx');
>>> > > >
>>> > > > would test that.
>>> >
>>> > --
>>> > Mark H. Wood
>>> > Lead Technology Analyst
>>> >
>>> > University Library
>>> > Indiana University - Purdue University Indianapolis
>>> > 755 W. Michigan Street
>>> > Indianapolis, IN 46202
>>> > 317-274-0749
>>> > www.ulib.iupui.edu
>>> >
>>> > --
>>> > All messages to this mailing list should adhere to the DuraSpace Code 
>>> of Conduct: https://duraspace.org/about/policies/code-of-conduct/
>>> > ---
>>> > You received this message because you are subscribed to the Google 
>>> Groups "DSpace Technical Support" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to dspac...@googlegroups.com <javascript:>.
>>> > To post to this group, send email to dspac...@googlegroups.com 
>>> <javascript:>.
>>> > Visit this group at https://groups.google.com/group/dspace-tech.
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> -- 
>>> All messages to this mailing list should adhere to the DuraSpace Code of 
>>> Conduct: https://duraspace.org/about/policies/code-of-conduct/
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "DSpace Technical Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to dspac...@googlegroups.com <javascript:>.
>>> To post to this group, send email to dspac...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>>
>> Tim Donohue
>> Technical Lead for DSpace & DSpaceDirect
>> DuraSpace.org | DSpace.org | DSpaceDirect.org
>>
>> -- 
>> All messages to this mailing list should adhere to the DuraSpace Code of 
>> Conduct: https://duraspace.org/about/policies/code-of-conduct/
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dspac...@googlegroups.com <javascript:>.
>> To post to this group, send email to dspac...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to