Tim,

     Thanks for your comments. Your mention of the format registry made me 
think of something.  I remembered being unclear about which tables I had to 
load explicitly, for some reason, when I was implementing the two newest 
instances.  I just looked and the "fileextension" table is EMPTY in those two 
instances!  This of course would be problematic, now wouldn't it?!  :)   I'm 
not sure where in the implementation procedures this is done, but for some 
reason I missed it.  I'll just unload the fileextension table from one of the 
good instances and copy it into the two empty ones.



Thanks to all who replied to my question.  I'm pretty knowledgeable when it 
comes to DSpace, but I love having this list of folks to depend on when I'm 
stumped on something.



Sue



-----Original Message-----
From: Tim Donohue [mailto:tdono...@duraspace.org]
Sent: Thursday, May 06, 2010 1:20 PM
To: dspace-tech@lists.sourceforge.net; Thornton, Susan M. (LARC-B702)[RAYTHEON 
TECHNICAL SERVICES COMPANY]
Subject: Re: [Dspace-tech] How does the bitstream_format_id get set in a DSpace 
1.5.1 import?



Sue,



A few comments inline...



On 5/6/2010 11:44 AM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL

SERVICES COMPANY] wrote:

>   I just noticed in one of our DSpace instances that _all_ of the rows

> in the bitstream table have column "bitstream_format_id" set to "1" -

> Unknown. All the documents are either .pdf files or their equivalent

> .pdf.txt files (from filter-media). The strange thing is that all the

> .pdf files are in ORIGINAL bundles and all the .pdf.txt files are in

> TEXT bundles.



The ORIGINAL bundle always contains the original files (as they were

uploaded in DSpace).  The TEXT bundle always includes text-extraction

files which are auto-generated by the filter-media script.  More info on

Bundle usage can be found in the DSpace Data Model descriptions:

http://www.dspace.org/1_6_0Documentation/ch02.html#docbook-functional.html-data_model



>

> What is the proper way to set the value of "bitstream_format_id" during

> an import? Is it a field you have to include in the Contents file? Or is

> it supposed to be set programmatically in DSpace? I guess I can write a

> query to "update" the bitstream_format_id based on the document names,

> i.e., .pdf files are bitstream_format_id = "3" and .pdf.txt files should

> be "5".



DSpace will attempt to recognize File Formats automatically on

upload/ingest.  It does so in a very rudimentary way, by essentially

checking the file extension.  If the uploaded file's extension matches a

known extension in DSpace's Bitstream Format Registry, than DSpace will

assume that file is of that known format.



So, each of your ".pdf" files should have been auto-recognized as PDF

format, assuming your Bitstream Format Registry has an entry for ".pdf"

(it should, as this is a default entry -- the only way it wouldn't is if

you specifically removed it, or your Format Registry was not initialized

properly to begin with).



I'm at a loss for why this doesn't seem to be working in your DSpace

installation (as I've never seen this before).  Is there any custom

submission/ingest code that code be affecting this?   Are you ingesting

this content via a UI (XMLUI or JSPUI) or is it all ingested via

commandline (either way, DSpace should be recognizing the formats

properly -- but it could help narrow down the problem)?



- Tim
------------------------------------------------------------------------------

_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Richard, Joel M
    • R... Tim Donohue
      • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to