[dspace-tech] Looking to understand - where did the PDFs go?

Katy Earl Wed, 22 Nov 2023 16:17:09 -0800


Hello,


 So I’ve scoured what areas I could find in the documentation, particularly 
the Lyrasis “Storage Layer 
<https://wiki.lyrasis.org/display/DSDOC7x/Storage+Layer>” page, but I admit 
I’m still not fully understanding what exactly happens to files imported 
into the DSpace system. Perhaps someone can point me in the right direction?

 Do I have this correct, using PDFs as an example? When you import a PDF to 
DSpace, such as by creating a new Item in the front end browser and 
dragging and dropping it for ingestion, and then saving it, and then later 
running filter-media, I’m guessing that the PDF is stored in pieces in the 
following way? Is it stored in whole anywhere??

   1. *Assetstore*: DSpace makes a new file based on the PDF, converting 
   the bits by some sort of converter to a bitstream. 
      1. Result: a “bitstream” file which looks, more or less, like a text 
      file (openable and readable in Notepad). 
      2. Where: DSpace’s “assetstore” directory. 
      3. Questions:
         - The original PDF, with all the formatting information, what 
         happens to it? Is it deleted?
            - Why confused: If I change the bitstream in the “assetstore” 
            to have an extension of .pdf, PDF readers don’t recognize the file, 
so this 
            can’t be the original file. So where did it go? 
         - What part of DSpace is doing the file conversion? 
      2. *PostgreSQL*: DSpace creates some metadata about the file, perhaps 
   including the metadata that the PDF has in its Properties and puts it 
   somewhere in the PostgreSQL database. 
      1. Question: 
         - Does this metadata about the PDF’s page formatting also get 
         stored in PostgreSQL? 
         - Does any part of the PDF itself get stored in PostgreSQL? Maybe 
         as a BLOB? 
         - Which tables exactly are holding this information? 
      3. *Solr*: DSpace’s filter-media extracts the full-text of the PDF 
   and gives it to Solr for indexing about the PDF. 
   4. *Is there anywhere else the PDF, in part or in whole, is stored*? 

 

Related to this, when a DSpace Item that is a PDF is opened in a browser 
through DSpace’s ui and it is downloaded, is the PDF getting “recreated” by 
DSpace on the fly from the bitstream and metadata in PostgreSQL (and ???) 
and then fed to the browser to open? Which part of DSpace code is handling 
this? Or is DSpace feeding the browser an already intact PDF from somewhere 
on the system?

Anyway, apologies if the answers should be obvious, but looking around, I’m 
not the only person who is unfamiliar with a DAMS and how they store files, 
pieced out or in whole or otherwise.

Many thanks in advance, an answer will clear up a lot of uncertainty.

Best,

Katy

 

 

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/d922ddf1-1b0f-45f7-8f4d-3271d9f4c518n%40googlegroups.com.

[dspace-tech] Looking to understand - where did the PDFs go?

Reply via email to