Sue,

A few comments inline...

On 5/6/2010 11:44 AM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL 
SERVICES COMPANY] wrote:
>   I just noticed in one of our DSpace instances that _all_ of the rows
> in the bitstream table have column “bitstream_format_id” set to “1” –
> Unknown. All the documents are either .pdf files or their equivalent
> .pdf.txt files (from filter-media). The strange thing is that all the
> .pdf files are in ORIGINAL bundles and all the .pdf.txt files are in
> TEXT bundles.

The ORIGINAL bundle always contains the original files (as they were 
uploaded in DSpace).  The TEXT bundle always includes text-extraction 
files which are auto-generated by the filter-media script.  More info on 
Bundle usage can be found in the DSpace Data Model descriptions:
http://www.dspace.org/1_6_0Documentation/ch02.html#docbook-functional.html-data_model

>
> What is the proper way to set the value of “bitstream_format_id” during
> an import? Is it a field you have to include in the Contents file? Or is
> it supposed to be set programmatically in DSpace? I guess I can write a
> query to “update” the bitstream_format_id based on the document names,
> i.e., .pdf files are bitstream_format_id = “3” and .pdf.txt files should
> be “5”.

DSpace will attempt to recognize File Formats automatically on 
upload/ingest.  It does so in a very rudimentary way, by essentially 
checking the file extension.  If the uploaded file's extension matches a 
known extension in DSpace's Bitstream Format Registry, than DSpace will 
assume that file is of that known format.

So, each of your ".pdf" files should have been auto-recognized as PDF 
format, assuming your Bitstream Format Registry has an entry for ".pdf" 
(it should, as this is a default entry -- the only way it wouldn't is if 
you specifically removed it, or your Format Registry was not initialized 
properly to begin with).

I'm at a loss for why this doesn't seem to be working in your DSpace 
installation (as I've never seen this before).  Is there any custom 
submission/ingest code that code be affecting this?   Are you ingesting 
this content via a UI (XMLUI or JSPUI) or is it all ingested via 
commandline (either way, DSpace should be recognizing the formats 
properly -- but it could help narrow down the problem)?

- Tim

------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Richard, Joel M
    • R... Tim Donohue
      • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to