Gets,

Yes, my setting are alright:
fedoragsearch.xsltProcessor = xalan

I asked Islandora community about problem with their function,
hopefully they will update the class. I included that inquiry here
just in case.

Thanks,
Serhiy


On Mon, Nov 28, 2011 at 5:10 AM, Gert Schmeltz Pedersen
<g...@dtic.dtu.dk> wrote:
> One thing in order to avoid mixing xalan and saxon is to check the exts
> definition in your indexing stylesheet (whether lucene or solr), from
> fedoragsearch.properties:
> # xsltProcessor, xalan or saxon
> # this choice must be accompanied by the right namespace in your
> foxmlToLucene.xslt:
> #
> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl"
> for xalan
> #
> xmlns:exts="java://dk.defxws.fedoragsearch.server.GenericOperationsImpl"
> for saxon
> fedoragsearch.xsltProcessor = xalan
> Please address the Islandora community also. I appreciate very much their
> use of GSearch, but I cannot answer questions about it.
> -Gert
> On 28/11/2011, at 11.47, Serhiy Polyakov wrote:
>
> Gert,
>
> I fixed the problem of indexing managed datastreams by downloading
> newer GSearch 2.3 few days ago. I was trying to fix the problem with
> the version I got from https://github.com/fcrepo/gsearch on October
> 31. I think it was beta.
>
> So this class
> dk.defxws.fedoragsearch.server.GenericOperationsImpl
> is working now.
>
> I am still getting the error with another class that comes from
> Islandora and is supposed to parse MODS inline datastream. Command
> line processing with Xalan gives this error that for some reason
> refers to Saxon?:
>
> file:///home/user1/4_XML_Tr/demoFoxmlToSolr.xslt; Line #298; Column
> #-1; XSLT Error (net.sf.saxon.trans.XPathException): Cannot find a
> matching 8-argument function named
> {xalan://ca.upei.roblib.DataStreamForXSLT}getDatastreamTextRaw()
> Exception in thread "main" java.lang.RuntimeException: Cannot find a
> matching 8-argument function named
> {xalan://ca.upei.roblib.DataStreamForXSLT}getDatastreamTextRaw()
>       at org.apache.xalan.xslt.Process.doExit(Process.java:1153)
>       at org.apache.xalan.xslt.Process.main(Process.java:1126)
>
> Interestingly, parser worked from the command line when I removed
> tomcat/webapps/fedoragsearch/WEB-INF/lib/saxon9he.jar
> and also copied fedora-client-3.1.jar there (taken from GSearch 2.2).
> However, in this case http://myhost:8080/fedoragsearch/rest says Saxon
> is missing.
>
>
> Thanks,
> Serhiy
>
>
>
> On Wed, Nov 23, 2011 at 3:23 AM, Gert Schmeltz Pedersen
> <g...@dtic.dtu.dk> wrote:
>
> I can confirm that the pdf document in datastream DS2 of the demo object
> demo:18 is indexed in my test installation.
>
> If I understand you correctly, you _do_ get the pdf indexed as part of
> foxml.all.text, right? So that must mean that the error is produced
> somewhere else in your indexing stylesheet, maybe in line #86 as indicated
> in the error message below, also, it is strange that the error message
> refers to saxon, saxon cannot work, when your exts refers to xalan. Look
> into fedoragsearch.log and catalina.out, there must be something.
>
> -Gert
>
>
> On 23/11/2011, at 09.26, Serhiy Polyakov wrote:
>
> Hello,
>
> I am trying to get OBJ datastream (application/pdf) processed and
>
> indexed into Solr 3.4 with GSearch2.3. I excluded all MODS streams to
>
> isolate the problem. So I have DC and OBJ (pdf)
>
> Note: Pdf indexing was working for me in last spring installation with
>
> GSearch 2.2 on Lucene. Summer time system with and GSearch 2.2 beta on
>
> Solr 1.4 is not indexing pdf as well.
>
> For the debugging I tried command line XSLT processor xalan 2.7.0 that
>
> comes with GSearch. I include all classpath vars as I mentioned in
>
> previous messages.
>
> It gives this Error:
>
> file:///home/fedora/3_XML_Pro/foxmlToSolr.xslt; Line #86; Column #-1;
>
> XSLT Error (net.sf.saxon.trans.XPathException): Cannot find a matching
>
> 8-argument function named
>
> {xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl}getDatastreamText()
>
> Exception in thread "main" java.lang.RuntimeException: Cannot find a
>
> matching 8-argument function named
>
> {xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl}getDatastreamText()
>
>       at org.apache.xalan.xslt.Process.doExit(Process.java:1153)
>
>       at org.apache.xalan.xslt.Process.main(Process.java:1126)
>
> Only when I downloaded Xalan 2.7.1 into separate directory and added
>
> classpath to it in the command line I can process and get output file
>
> with all the fields including OBJ fulltext extracted from pdf. I tried
>
> to overwrite Xalan Jars that came with GSearch with new ones but it
>
> still gives same error. Only when I am directly running Xalan 2.7.1
>
> from the separate directory it is processing the input file.
>
> ====================
>
> Here is excerpt from the input object's Foxml I am using to process:
>
> <foxml:datastream ID="OBJ" FEDORA_URI="info:fedora/islandora:6/OBJ"
>
> STATE="A" CONTROL_GROUP="M" VERSIONABLE="true">
>
> <foxml:datastreamVersion ID="OBJ.0" LABEL="Title_2.pdf"
>
> CREATED="2011-10-19T09:07:40.379Z" MIMETYPE="application/pdf"
>
> SIZE="56276">
>
> <foxml:contentLocation TYPE="INTERNAL_ID"
>
> REF="http://myhost:8080/fedora/get/islandora:6/OBJ/2011-10-19T09:07:40.379Z"/>
>
> ====================
>
> I am using stylesheet foxmlToSolr.xslt that came with GSearch. It has
>
> the following lines in header:
>
> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl"
>
> exclude-result-prefixes="exts"
>
> ---------------------------
>
> And the following in the body:
>
> <xsl:for-each select="foxml:datastream[@CONTROL_GROUP='M' or
>
> @CONTROL_GROUP='E' or @CONTROL_GROUP='R']">
>
>   <field>
>
>       <xsl:attribute name="name">
>
>           <xsl:value-of select="concat('dsm.', @ID)"/>
>
>       </xsl:attribute>
>
>       <xsl:value-of select="exts:getDatastreamText($PID,
>
> $REPOSITORYNAME, @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS,
>
> $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/>
>
>   </field>
>
> </xsl:for-each>
>
> ====================
>
> When objects are submitted into Fedora all inline data streams are
>
> getting OK into the index. All non-inline (Managed) datasteams that do
>
> not require external processing (like ORC text) are processed OK into
>
> index. Non-inline datasteam OBJ containing pdf that require external
>
> processing are not getting into the index.
>
> I have this package
>
> dk.defxws.fedoragsearch.server.GenericOperationsImpl
>
> under
>
> ..tomcat/webapps/fedoragsearch/WEB-INF/classes
>
> And it is used by GSearch for extraction of foxml.all.text. It means
>
> it is visible for GSearch. Sounds like it is only when GSearh passes
>
> pdf content of OBJ datastream for extraction it is not getting it
>
> back.
>
> Could somebody confirm that objects with pdf content are fulltext
>
> indexed OK with GSearch on Solr?
>
> Thanks,
>
> Serhiy
>
> ------------------------------------------------------------------------------
>
> All the data continuously generated in your IT infrastructure
>
> contains a definitive record of customers, application performance,
>
> security threats, fraudulent activity, and more. Splunk takes this
>
> data and makes sense of it. IT sense. And common sense.
>
> http://p.sf.net/sfu/splunk-novd2d
>
> _______________________________________________
>
> Fedora-commons-users mailing list
>
> Fedora-commons-users@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
> ------------------------------------------------------------------------------
>
> All the data continuously generated in your IT infrastructure
>
> contains a definitive record of customers, application performance,
>
> security threats, fraudulent activity, and more. Splunk takes this
>
> data and makes sense of it. IT sense. And common sense.
>
> http://p.sf.net/sfu/splunk-novd2d
>
> _______________________________________________
>
> Fedora-commons-users mailing list
>
> Fedora-commons-users@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to