Tim, Thanks for your comments. Your mention of the format registry made me think of something. I remembered being unclear about which tables I had to load explicitly, for some reason, when I was implementing the two newest instances. I just looked and the "fileextension" table is EMPTY in those two instances! This of course would be problematic, now wouldn't it?! :) I'm not sure where in the implementation procedures this is done, but for some reason I missed it. I'll just unload the fileextension table from one of the good instances and copy it into the two empty ones.
Thanks to all who replied to my question. I'm pretty knowledgeable when it comes to DSpace, but I love having this list of folks to depend on when I'm stumped on something. Sue -----Original Message----- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Thursday, May 06, 2010 1:20 PM To: dspace-tech@lists.sourceforge.net; Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] Subject: Re: [Dspace-tech] How does the bitstream_format_id get set in a DSpace 1.5.1 import? Sue, A few comments inline... On 5/6/2010 11:44 AM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] wrote: > I just noticed in one of our DSpace instances that _all_ of the rows > in the bitstream table have column "bitstream_format_id" set to "1" - > Unknown. All the documents are either .pdf files or their equivalent > .pdf.txt files (from filter-media). The strange thing is that all the > .pdf files are in ORIGINAL bundles and all the .pdf.txt files are in > TEXT bundles. The ORIGINAL bundle always contains the original files (as they were uploaded in DSpace). The TEXT bundle always includes text-extraction files which are auto-generated by the filter-media script. More info on Bundle usage can be found in the DSpace Data Model descriptions: http://www.dspace.org/1_6_0Documentation/ch02.html#docbook-functional.html-data_model > > What is the proper way to set the value of "bitstream_format_id" during > an import? Is it a field you have to include in the Contents file? Or is > it supposed to be set programmatically in DSpace? I guess I can write a > query to "update" the bitstream_format_id based on the document names, > i.e., .pdf files are bitstream_format_id = "3" and .pdf.txt files should > be "5". DSpace will attempt to recognize File Formats automatically on upload/ingest. It does so in a very rudimentary way, by essentially checking the file extension. If the uploaded file's extension matches a known extension in DSpace's Bitstream Format Registry, than DSpace will assume that file is of that known format. So, each of your ".pdf" files should have been auto-recognized as PDF format, assuming your Bitstream Format Registry has an entry for ".pdf" (it should, as this is a default entry -- the only way it wouldn't is if you specifically removed it, or your Format Registry was not initialized properly to begin with). I'm at a loss for why this doesn't seem to be working in your DSpace installation (as I've never seen this before). Is there any custom submission/ingest code that code be affecting this? Are you ingesting this content via a UI (XMLUI or JSPUI) or is it all ingested via commandline (either way, DSpace should be recognizing the formats properly -- but it could help narrow down the problem)? - Tim
------------------------------------------------------------------------------
_______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech