Well if by whammy you mean a read access, yes. But my point was that the Lucene indexing is done (absent corruption) only once - the exploded text asset file is not needed for a Lucene lookup - it consults its own constructed index file. So the performance - i.e. routine use of the index for look-ups - is completely independent of the asset store.
If there is a read performance problem with a given store back-end, that's surely a concern, but Lucene doesn't add any specially onerous overhead to it. Having said all this, it is true that 'index-alls' are run fairly cavalierly, and it it worth noting this dependency. Richard Quoting Mark Diggory <[EMAIL PROTECTED]>: >> >> On 5/4/07, Cory Snavely < [EMAIL PROTECTED]> wrote: >> Well, I'm just wondering, in specific terms, if we use an object-based >> storage system as an assetstore rather than a filesystem, where the >> files that Lucene indexes actually sit. > > Its tricky, this is what FilterMedia is for, it actually extracts the > text and places it as a bitstream in the assetstore. Lucene full > text indexing is done against the assetstore bitstreams in all cases > (well accept for the metadata table in the database). So ultimately > your pushing the text bitstreams into the assetstore (s3) in > FilterMedia and pulling it back out on Lucene indexing, a > double-whammy. > > Cheers, > Mark > >> >> It's my understanding that in a filesystem-based assetstore, for >> example, text is extracted from PDFs and stored in a separate file >> *within the assetstore directory* that Lucene crawls. I just don't know >> how that sort of thing is handled when using object-based storage. >> >> On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote: >> > Hi Cory: >> > >> > Not sure about the limits of Lucene, but I think the larger point is >> > that the back-ends are expected only to hold the real content or assets. >> > Everything else (full-text indices and the like) are *artifacts* (can be >> > recreated from the assets) that we don't need to manage in the same way. >> > If for performance reasons we want to put them where the assets are we >> > can, but there is really no connection between the two that the system >> > imposes. >> > >> > Does this get at your question, or did I miss the point? >> > >> > Thanks, >> > >> > Richard R >> > >> > On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote: >> > > (Apologies if this has been discussed to resolution; after a few >> > > attempts to search the archives, I concluded they are really >> broken. 500 >> > > errors, bad links, etc.) >> > > >> > > For those using, interested in, or knowledgeable about using API-based >> > > storage (SRB, S3) as a backend for DSpace: how does doing so affect >> > > full-text indexing? Can anyone describe how, in such a setup, full text >> > > is stored and indexed? >> > > >> > > My uneducated impression is that Lucene would want to work only against >> > > a filesystem. >> > > >> > > Thanks, >> > > Cory Snavely >> > > University of Michigan Library IT Core Services >> > > >> > > >> > > >> > > >> ---------------------------------------------------------------------- >> --- >> > > This SF.net email is sponsored by DB2 Express >> > > Download DB2 Express C - the FREE version of DB2 express and take >> > > control of your XML. No limits. Just data. Click to get it now. >> > > http://sourceforge.net/powerbar/db2/ >> > > _______________________________________________ >> > > DSpace-tech mailing list >> > > [email protected] >> > > https://lists.sourceforge.net/lists/listinfo/dspace-tech >> > >> >> >> ---------------------------------------------------------------------- --- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech >> >> ---------------------------------------------------------------------- --- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech > > ~~~~~~~~~~~~~ > Mark R. Diggory - DSpace Systems Manager > MIT Libraries, Systems and Technology Services > Massachusetts Institute of Technology > > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

