Thanks for the info. I think the mods problem depends on schema.xml only, you
have to experiment further with it. Have you taken out my <dynamicField
name="*" ...?
-Gert
On 22/11/2011, at 10.07, Serhiy Polyakov wrote:
Gert,
I was able to generate output from command line by using downloaded
Xalan and adding class paths. But I have another question below.
So my command line is like here
java -Xms512m -Xmx1024m -cp \
[FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes:\
[FedoraHome]/DISTR_XALAN/xalan/*:\
[FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/lib/*:\
[FedoraHome]/fedora/solr_dir/contrib/extraction/lib/*: \
org.apache.xalan.xslt.Process \
-PARAM FEDORASOAP 'http://localhost:8080/fedora/services' \
-PARAM REPOSITORYNAME 'SomeName' \
-PARAM FEDORAUSER 'fedoraAdmin' \
-PARAM FEDORAPASS 'SomePassword' \
-PARAM TRUSTSTOREPATH '[FedoraHome]/fedora/server/truststore' \
-PARAM TRUSTSTOREPASS 'SomePassword' \
-in [FileIn.xml] \
-xsl foxmlToSolr.xslt \
-out [FileOut.xml]
All managed content is getting into the FileOut.xml including PDF as a
text. Here is excerpt:
<field name="dc.title">Pdf docum</field>
<field name="mods.title">Pdf docum</field>
<field name="dsm.OBJ">extracted content</field>
Another question. Now I am trying to get the fields into Solr Index.
All fields except mods.* are going there. My steps:
(1) Edit foxmlToSolr.xslt so that I am getting all metadata fields I
need in the output (confirmed using command line method above).
(2) Edit schema.xml for Solr adding statements like here:
<copyField source="mods.title" dest="mods.title_s" />
<field name="mods.title" type="text" indexed="true" stored="false"
multiValued="true"/>
<field name="mods.title_s" type="string" maxChars="300" indexed="true"
stored="true"/>
After this I stopped Tomcat, deleted index, started Tomcat, updated
index using Fedora GSearch web admin.
No MODS fields in the created index (I looked up with Luke)? I have
all other fields created OK, like dc.*, dsm.OCR and others.
Do I need to edit other files except two above? Any suggestions would help.
Thanks,
Serhiy
On Mon, Nov 21, 2011 at 3:48 AM, Gert Schmeltz Pedersen
<g...@dtic.dtu.dk<mailto:g...@dtic.dtu.dk>> wrote:
Hi Serhiy,
I think that you are missing
dk.defxws.fedoragsearch.server.GenericOperationsImpl
and related classes from the classpath, when you run from command line. Let me
know how it goes.
-Gert
On 21/11/2011, at 10.04, Serhiy Polyakov wrote:
At first I did not pass parameters to the exts:getDatastreamText
I did it now. Still no OCR text content if OUT.txt fields.
Serhiy
On Mon, Nov 21, 2011 at 2:27 AM, Serhiy Polyakov
<sp0...@gmail.com<mailto:sp0...@gmail.com>> wrote:
Hello,
I want to use command line to process exported Fedora object using
foxmlToSolr.xslt stylesheet. I need to see the resulting document that
will be used by solr/conf/schema.xml to create index.
Object's Foxml includes inline DC datastream and managed (external)
OCR datastream that contains text/plain. Foxml includes reference to
OCR datastream on the local server like
http://localhost:8080/fedora/get/... I pointed browser to the OCR
datastream reference and I see the text there. My FedoraGSearch
indexed DC and OCR alright as a part of regular workflow so
foxmlToSolr.xslt must be correct.
However I need to do transformation from command line for the
analysts. I downloaded Xalan and run:
java -cp dk/defxws/fedoragsearch/server:path/to/xalan/*:
org.apache.xalan.xslt.Process -in <SOURCE.xml> -xsl foxmlToSolr.xslt
-out <OUT.txt>
Here is excerpt from OUT.txt
<field name=”dc.title”>My Title</field>
<field name=”dsm.OCR”/>
So it is not grabbing managed content (OCR in my case).
foxmlToSolr.xslt includes external function definition and I believe
is using it for managed content:
======
…
xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl"
…
xsl:value-of select="exts:getDatastreamText($PID, $REPOSITORYNAME,
@ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH,
$TRUSTSTOREPASS)"/>
…
=====
Could somebody suggest me if this is at all possible to get managed
content into the output when I am doing command line processing.
Again, managed content is getting to the index as part of regular
FedoraGSearch workflow with the same foxmlToSolr.xslt.
Thanks,
Serhiy
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net<mailto:Fedora-commons-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net<mailto:Fedora-commons-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users