On Thu, Sep 8, 2016 at 1:22 AM Dr. Hirn <[email protected]> wrote: > Hi Chad, > > > So Cirrus will index file contents for which we have a media handler > > defined. > > Right now, Pdf and Djvu files have specific media handlers that can > extract > > their text contents. > > Do I have to configure something more? My uploaded pdf don't get indexed. > > The relevant lines in my LocalSettings.php: > > wfLoadExtension( 'Elastica' ); > require_once "$IP/extensions/CirrusSearch/CirrusSearch.php"; > $wgCirrusSearchServers = array('xxx.xxx.xxx.xxx'); > $wgSearchType = 'CirrusSearch'; > > > Do you have the PdfHandler extension installed as well? If that's installed then this should Just Work without any additional configuration. Unless something has changed recently....
> > If you have an additional media type you want to extract text from, > that's > > what > > would need implementing. > > Any hints on that? > > Sure. We've got a class in MediaWiki called ImageHandler. Media types that require special handling have a subclass of that. Here's the ones for PDF and DjVu for example: https://phabricator.wikimedia.org/diffusion/EPHD/browse/master/PdfHandler_body.php https://phabricator.wikimedia.org/diffusion/MW/browse/master/includes/media/DjVu.php If you wanted to index, say, Word documents, you'd need a similar class in an extension to provide that support (there might be an extension for word docs already, I dunno). -Chad _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
