That's what I wanted to know--thanks!

On Fri, 2007-05-04 at 14:28 -0400, Richard Rodgers wrote:
> Hi Cory:
> 
> On Fri, 2007-05-04 at 13:52 -0400, Cory Snavely wrote:
> > So you are saying that for a format of eg PDF, filter-media, during its
> > traversal of the assetstore backended on eg SRB, reads the PDF from SRB,
> > extracts text, and stores that as a file back in SRB. 
> 
> Yes. A little more precisely: MediaFilter does not directly traverse the
> backend - rather it examines each Item in the database, then for each
> bitstream in the ORIGINAL bundle of that item, if (1) the format of the
> bitstream (as recorded in the database) has a filter associated with it
> (as is the case with PDF), and (2) the extracted text file has not
> already been created, then it reads the (e.g. PDF) file, using the
> standard API (which hides the actual location of the file), extracts the
> text, and stores - again using the standard API - the text as a file in
> the TEXT bundle of the item.
> 
> > Then, once its
> > crawl of the assetstore is done, it reads the extracted text back in
> > from SRB and indexes it. The index then lives in the filesystem,
> > specifically within [dspace]/search.
> 
> Yes. A little more precisely: as a convenience, by default the indexer
> is invoked after MediaFilter has run (this can be defeated with a
> command-line argument). But this occurs whenever the indexing is run
> (e.g. when 'index-all' is run). The index files do live at
> [dspace]/search, which is conventionally a local filesystem, but
> certainly may be an NFS mount-point, etc
> > 
> > When I refer to transactions against SRB, I am assuming that those are
> > generic read and write operations in DSpace methods that are calling eg
> > SRB methods.
> 
> Yes, the 'BitstreamStorageManager' exports methods to read, write, etc
> These constitute the API to which I was alluding.
> 
> Hope this clarifies,
> 
> Richard
> > 
> > Correct? 
> > 
> > Thanks,
> > Cory
> > 
> > On Fri, 2007-05-04 at 09:46 -0400, Richard Rodgers wrote:
> > > See notes:
> > > 
> > > Quoting Cory Snavely <[EMAIL PROTECTED]>:
> > > 
> > > > Right--I am trying to get an understand of all this in very specific
> > > > terms.
> > > >
> > > > On Fri, 2007-05-04 at 09:23 -0400, Mark H. Wood wrote:
> > > >> There are two questions here:
> > > >>
> > > >> 1)  Does the use of a non-filesystem asset store backend affect 
> > > >> Lucene's
> > > >>     output?  One would guess, no, since it doesn't do output to the
> > > >>     asset store.
> > > Correct - no. Lucene reads the file for indexing through the storage API 
> > > - it
> > > therefore has a BitStream, not a location on a storage device.
> > >
> > > >> 2)  Does the use of a non-filesystem asset store backend affect
> > > >>     Lucene's input?  IOW how does Lucene, as used in DSpace, locate
> > > >>     and gain access to the files it indexes?  If it doesn't go through
> > > >>     the DSpace storage layer or something equivalent then indexing is
> > > >>     screwed.
> > > No - for the same reason. It does not circumvent the storage API or make
> > > any assumptions about where the files with the text to index lives
> > > >>
> > > >> Ouch!  I hadn't thought about these at all.
> > > >>
> > > Remember, we already support SRB, (a non-local filesystem option), and 
> > > indexing
> > > works fine.
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > DSpace-tech mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/dspace-tech
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to