We found the same problem. The issue is that it is checking the 
datastream's mime type to see if it can be handled *after* calling 
"getDatastreamText" the method. This method uses the 
getDatastreamDissemination method from API-A to get the mime-type. 
Unfortunately, this also pulls down the entire datastream whether or not 
text can be extracted from that particular stream. So, yes, all 
datastreams are being transferred not only the ones that can be handled. 
If you do not want to index the full text of any of your managed 
datastreams you can safely comment out that section in the XSLT. 
Otherwise, you can check the mime-type of the datastream using XSLT 
before calling "getDatastreamText". The mime-types it can handle are: 
text/plain, text/xml, text/html, application/pdf, application/ps, and 
application/msword.

Matt

--
Matt Cordial
Digital Library Software Engineer
Arizona State University Library

arne anka wrote:
> the FoxmlToLucene stylesheet contains severa stanzas for datastreams, all  
> with a line like that before
>
> <!-- an [...] datastream is fetched, if its mimetype can be handled, the  
> text becomes the value of the field. -->
>
> my question: what determines, if the "mimetype can be handled"?
>
> i try to reindex my recently migrated repository and obviously gsearch  
> tries to index even external datastreams of the type " image/tiff" which  
> is unwanted and takes ages.
> "obviously" because in a first run the external data wasn't available and  
> the indexer was ready after a few minutes -- now i made the data available  
> and the indexing goes on for over an hour already.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>   

------------------------------------------------------------------------------
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to