That's what I wanted to know--thanks! On Fri, 2007-05-04 at 14:28 -0400, Richard Rodgers wrote: > Hi Cory: > > On Fri, 2007-05-04 at 13:52 -0400, Cory Snavely wrote: > > So you are saying that for a format of eg PDF, filter-media, during its > > traversal of the assetstore backended on eg SRB, reads the PDF from SRB, > > extracts text, and stores that as a file back in SRB. > > Yes. A little more precisely: MediaFilter does not directly traverse the > backend - rather it examines each Item in the database, then for each > bitstream in the ORIGINAL bundle of that item, if (1) the format of the > bitstream (as recorded in the database) has a filter associated with it > (as is the case with PDF), and (2) the extracted text file has not > already been created, then it reads the (e.g. PDF) file, using the > standard API (which hides the actual location of the file), extracts the > text, and stores - again using the standard API - the text as a file in > the TEXT bundle of the item. > > > Then, once its > > crawl of the assetstore is done, it reads the extracted text back in > > from SRB and indexes it. The index then lives in the filesystem, > > specifically within [dspace]/search. > > Yes. A little more precisely: as a convenience, by default the indexer > is invoked after MediaFilter has run (this can be defeated with a > command-line argument). But this occurs whenever the indexing is run > (e.g. when 'index-all' is run). The index files do live at > [dspace]/search, which is conventionally a local filesystem, but > certainly may be an NFS mount-point, etc > > > > When I refer to transactions against SRB, I am assuming that those are > > generic read and write operations in DSpace methods that are calling eg > > SRB methods. > > Yes, the 'BitstreamStorageManager' exports methods to read, write, etc > These constitute the API to which I was alluding. > > Hope this clarifies, > > Richard > > > > Correct? > > > > Thanks, > > Cory > > > > On Fri, 2007-05-04 at 09:46 -0400, Richard Rodgers wrote: > > > See notes: > > > > > > Quoting Cory Snavely <[EMAIL PROTECTED]>: > > > > > > > Right--I am trying to get an understand of all this in very specific > > > > terms. > > > > > > > > On Fri, 2007-05-04 at 09:23 -0400, Mark H. Wood wrote: > > > >> There are two questions here: > > > >> > > > >> 1) Does the use of a non-filesystem asset store backend affect > > > >> Lucene's > > > >> output? One would guess, no, since it doesn't do output to the > > > >> asset store. > > > Correct - no. Lucene reads the file for indexing through the storage API > > > - it > > > therefore has a BitStream, not a location on a storage device. > > > > > > >> 2) Does the use of a non-filesystem asset store backend affect > > > >> Lucene's input? IOW how does Lucene, as used in DSpace, locate > > > >> and gain access to the files it indexes? If it doesn't go through > > > >> the DSpace storage layer or something equivalent then indexing is > > > >> screwed. > > > No - for the same reason. It does not circumvent the storage API or make > > > any assumptions about where the files with the text to index lives > > > >> > > > >> Ouch! I hadn't thought about these at all. > > > >> > > > Remember, we already support SRB, (a non-local filesystem option), and > > > indexing > > > works fine. > > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > DSpace-tech mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/dspace-tech >
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

