Gets, Yes, my setting are alright: fedoragsearch.xsltProcessor = xalan
I asked Islandora community about problem with their function, hopefully they will update the class. I included that inquiry here just in case. Thanks, Serhiy On Mon, Nov 28, 2011 at 5:10 AM, Gert Schmeltz Pedersen <g...@dtic.dtu.dk> wrote: > One thing in order to avoid mixing xalan and saxon is to check the exts > definition in your indexing stylesheet (whether lucene or solr), from > fedoragsearch.properties: > # xsltProcessor, xalan or saxon > # this choice must be accompanied by the right namespace in your > foxmlToLucene.xslt: > # > xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl" > for xalan > # > xmlns:exts="java://dk.defxws.fedoragsearch.server.GenericOperationsImpl" > for saxon > fedoragsearch.xsltProcessor = xalan > Please address the Islandora community also. I appreciate very much their > use of GSearch, but I cannot answer questions about it. > -Gert > On 28/11/2011, at 11.47, Serhiy Polyakov wrote: > > Gert, > > I fixed the problem of indexing managed datastreams by downloading > newer GSearch 2.3 few days ago. I was trying to fix the problem with > the version I got from https://github.com/fcrepo/gsearch on October > 31. I think it was beta. > > So this class > dk.defxws.fedoragsearch.server.GenericOperationsImpl > is working now. > > I am still getting the error with another class that comes from > Islandora and is supposed to parse MODS inline datastream. Command > line processing with Xalan gives this error that for some reason > refers to Saxon?: > > file:///home/user1/4_XML_Tr/demoFoxmlToSolr.xslt; Line #298; Column > #-1; XSLT Error (net.sf.saxon.trans.XPathException): Cannot find a > matching 8-argument function named > {xalan://ca.upei.roblib.DataStreamForXSLT}getDatastreamTextRaw() > Exception in thread "main" java.lang.RuntimeException: Cannot find a > matching 8-argument function named > {xalan://ca.upei.roblib.DataStreamForXSLT}getDatastreamTextRaw() > at org.apache.xalan.xslt.Process.doExit(Process.java:1153) > at org.apache.xalan.xslt.Process.main(Process.java:1126) > > Interestingly, parser worked from the command line when I removed > tomcat/webapps/fedoragsearch/WEB-INF/lib/saxon9he.jar > and also copied fedora-client-3.1.jar there (taken from GSearch 2.2). > However, in this case http://myhost:8080/fedoragsearch/rest says Saxon > is missing. > > > Thanks, > Serhiy > > > > On Wed, Nov 23, 2011 at 3:23 AM, Gert Schmeltz Pedersen > <g...@dtic.dtu.dk> wrote: > > I can confirm that the pdf document in datastream DS2 of the demo object > demo:18 is indexed in my test installation. > > If I understand you correctly, you _do_ get the pdf indexed as part of > foxml.all.text, right? So that must mean that the error is produced > somewhere else in your indexing stylesheet, maybe in line #86 as indicated > in the error message below, also, it is strange that the error message > refers to saxon, saxon cannot work, when your exts refers to xalan. Look > into fedoragsearch.log and catalina.out, there must be something. > > -Gert > > > On 23/11/2011, at 09.26, Serhiy Polyakov wrote: > > Hello, > > I am trying to get OBJ datastream (application/pdf) processed and > > indexed into Solr 3.4 with GSearch2.3. I excluded all MODS streams to > > isolate the problem. So I have DC and OBJ (pdf) > > Note: Pdf indexing was working for me in last spring installation with > > GSearch 2.2 on Lucene. Summer time system with and GSearch 2.2 beta on > > Solr 1.4 is not indexing pdf as well. > > For the debugging I tried command line XSLT processor xalan 2.7.0 that > > comes with GSearch. I include all classpath vars as I mentioned in > > previous messages. > > It gives this Error: > > file:///home/fedora/3_XML_Pro/foxmlToSolr.xslt; Line #86; Column #-1; > > XSLT Error (net.sf.saxon.trans.XPathException): Cannot find a matching > > 8-argument function named > > {xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl}getDatastreamText() > > Exception in thread "main" java.lang.RuntimeException: Cannot find a > > matching 8-argument function named > > {xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl}getDatastreamText() > > at org.apache.xalan.xslt.Process.doExit(Process.java:1153) > > at org.apache.xalan.xslt.Process.main(Process.java:1126) > > Only when I downloaded Xalan 2.7.1 into separate directory and added > > classpath to it in the command line I can process and get output file > > with all the fields including OBJ fulltext extracted from pdf. I tried > > to overwrite Xalan Jars that came with GSearch with new ones but it > > still gives same error. Only when I am directly running Xalan 2.7.1 > > from the separate directory it is processing the input file. > > ==================== > > Here is excerpt from the input object's Foxml I am using to process: > > <foxml:datastream ID="OBJ" FEDORA_URI="info:fedora/islandora:6/OBJ" > > STATE="A" CONTROL_GROUP="M" VERSIONABLE="true"> > > <foxml:datastreamVersion ID="OBJ.0" LABEL="Title_2.pdf" > > CREATED="2011-10-19T09:07:40.379Z" MIMETYPE="application/pdf" > > SIZE="56276"> > > <foxml:contentLocation TYPE="INTERNAL_ID" > > REF="http://myhost:8080/fedora/get/islandora:6/OBJ/2011-10-19T09:07:40.379Z"/> > > ==================== > > I am using stylesheet foxmlToSolr.xslt that came with GSearch. It has > > the following lines in header: > > xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl" > > exclude-result-prefixes="exts" > > --------------------------- > > And the following in the body: > > <xsl:for-each select="foxml:datastream[@CONTROL_GROUP='M' or > > @CONTROL_GROUP='E' or @CONTROL_GROUP='R']"> > > <field> > > <xsl:attribute name="name"> > > <xsl:value-of select="concat('dsm.', @ID)"/> > > </xsl:attribute> > > <xsl:value-of select="exts:getDatastreamText($PID, > > $REPOSITORYNAME, @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, > > $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/> > > </field> > > </xsl:for-each> > > ==================== > > When objects are submitted into Fedora all inline data streams are > > getting OK into the index. All non-inline (Managed) datasteams that do > > not require external processing (like ORC text) are processed OK into > > index. Non-inline datasteam OBJ containing pdf that require external > > processing are not getting into the index. > > I have this package > > dk.defxws.fedoragsearch.server.GenericOperationsImpl > > under > > ..tomcat/webapps/fedoragsearch/WEB-INF/classes > > And it is used by GSearch for extraction of foxml.all.text. It means > > it is visible for GSearch. Sounds like it is only when GSearh passes > > pdf content of OBJ datastream for extraction it is not getting it > > back. > > Could somebody confirm that objects with pdf content are fulltext > > indexed OK with GSearch on Solr? > > Thanks, > > Serhiy > > ------------------------------------------------------------------------------ > > All the data continuously generated in your IT infrastructure > > contains a definitive record of customers, application performance, > > security threats, fraudulent activity, and more. Splunk takes this > > data and makes sense of it. IT sense. And common sense. > > http://p.sf.net/sfu/splunk-novd2d > > _______________________________________________ > > Fedora-commons-users mailing list > > Fedora-commons-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > > ------------------------------------------------------------------------------ > > All the data continuously generated in your IT infrastructure > > contains a definitive record of customers, application performance, > > security threats, fraudulent activity, and more. Splunk takes this > > data and makes sense of it. IT sense. And common sense. > > http://p.sf.net/sfu/splunk-novd2d > > _______________________________________________ > > Fedora-commons-users mailing list > > Fedora-commons-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users