Well if by whammy you mean a read access, yes. But my point was that 
the Lucene
indexing is done (absent corruption) only once - the exploded text asset file
is not needed for a Lucene lookup - it consults its own constructed 
index file.
So the performance - i.e. routine use of the index for look-ups - is 
completely
independent of the asset store.

If there is a read performance problem with a given store back-end, that's
surely a concern, but Lucene doesn't add any specially onerous overhead to it.

Having said all this, it is true that 'index-alls' are run fairly cavalierly,
and it it worth noting this dependency.

Richard

Quoting Mark Diggory <[EMAIL PROTECTED]>:

>>
>> On 5/4/07, Cory Snavely < [EMAIL PROTECTED]> wrote:
>> Well, I'm just wondering, in specific terms, if we use an object-based
>> storage system as an assetstore rather than a filesystem, where the
>> files that Lucene indexes actually sit.
>
> Its tricky, this is what FilterMedia is for, it actually extracts the 
>  text and places it as a bitstream in the assetstore. Lucene full 
> text  indexing is done against the assetstore bitstreams in all cases 
> (well  accept for the metadata table in the database). So ultimately 
> your  pushing the text bitstreams into the assetstore (s3) in 
> FilterMedia  and pulling it back out on Lucene indexing, a 
> double-whammy.
>
> Cheers,
> Mark
>
>>
>> It's my understanding that in a filesystem-based assetstore, for
>> example, text is extracted from PDFs and stored in a separate file
>> *within the assetstore directory* that Lucene crawls. I just don't  know
>> how that sort of thing is handled when using object-based storage.
>>
>> On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote:
>> > Hi Cory:
>> >
>> > Not sure about the limits of Lucene, but I think the larger point is
>> > that the back-ends are expected only to hold the real content or  assets.
>> > Everything else (full-text indices and the like) are *artifacts*  (can be
>> > recreated from the assets) that we don't need to manage in the  same way.
>> > If for performance reasons we want to put them where the assets  are we
>> > can, but there is really no connection between the two that the  system
>> > imposes.
>> >
>> > Does this get at your question, or did I miss the point?
>> >
>> > Thanks,
>> >
>> > Richard R
>> >
>> > On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote:
>> > > (Apologies if this has been discussed to resolution; after a few
>> > > attempts to search the archives, I concluded they are really  
>> broken. 500
>> > > errors, bad links, etc.)
>> > >
>> > > For those using, interested in, or knowledgeable about using  API-based
>> > > storage (SRB, S3) as a backend for DSpace: how does doing so  affect
>> > > full-text indexing? Can anyone describe how, in such a setup,  full text
>> > > is stored and indexed?
>> > >
>> > > My uneducated impression is that Lucene would want to work only  against
>> > > a filesystem.
>> > >
>> > > Thanks,
>> > > Cory Snavely
>> > > University of Michigan Library IT Core Services
>> > >
>> > >
>> > >
>> > >  
>> ---------------------------------------------------------------------- 
>> ---
>> > > This SF.net email is sponsored by DB2 Express
>> > > Download DB2 Express C - the FREE version of DB2 express and take
>> > > control of your XML. No limits. Just data. Click to get it now.
>> > > http://sourceforge.net/powerbar/db2/
>> > > _______________________________________________
>> > > DSpace-tech mailing list
>> > > [email protected]
>> > > https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> >
>>
>>
>> ---------------------------------------------------------------------- ---
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> DSpace-tech mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>
>> ---------------------------------------------------------------------- ---
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/ 
>> _______________________________________________
>> DSpace-tech mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
> ~~~~~~~~~~~~~
> Mark R. Diggory - DSpace Systems Manager
> MIT Libraries, Systems and Technology Services
> Massachusetts Institute of Technology
>
>
>



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to