We found the same problem. The issue is that it is checking the datastream's mime type to see if it can be handled *after* calling "getDatastreamText" the method. This method uses the getDatastreamDissemination method from API-A to get the mime-type. Unfortunately, this also pulls down the entire datastream whether or not text can be extracted from that particular stream. So, yes, all datastreams are being transferred not only the ones that can be handled. If you do not want to index the full text of any of your managed datastreams you can safely comment out that section in the XSLT. Otherwise, you can check the mime-type of the datastream using XSLT before calling "getDatastreamText". The mime-types it can handle are: text/plain, text/xml, text/html, application/pdf, application/ps, and application/msword.
Matt -- Matt Cordial Digital Library Software Engineer Arizona State University Library arne anka wrote: > the FoxmlToLucene stylesheet contains severa stanzas for datastreams, all > with a line like that before > > <!-- an [...] datastream is fetched, if its mimetype can be handled, the > text becomes the value of the field. --> > > my question: what determines, if the "mimetype can be handled"? > > i try to reindex my recently migrated repository and obviously gsearch > tries to index even external datastreams of the type " image/tiff" which > is unwanted and takes ages. > "obviously" because in a first run the external data wasn't available and > the indexer was ready after a few minutes -- now i made the data available > and the indexing goes on for over an hour already. > > ------------------------------------------------------------------------------ > _______________________________________________ > Fedora-commons-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > ------------------------------------------------------------------------------ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
