I forgot to mention that I am using Solr 3.4 and Fedora GSearch 2.3. I think I was using wrong type of field “text”. I do not see it defined in schema.xml. However, I tried other types and still no result. I added just one mods field like this:
<field name="mods.title" type="string" indexed="true" stored="true" multiValued="true"/> Still it is not going to the index even output of foxmlToSolr.xslt gives <field name="mods.title">Title 1</field> Serhiy On Tue, Nov 22, 2011 at 3:07 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: > Gert, > > I was able to generate output from command line by using downloaded > Xalan and adding class paths. But I have another question below. > > So my command line is like here > java -Xms512m -Xmx1024m -cp \ > [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes:\ > [FedoraHome]/DISTR_XALAN/xalan/*:\ > [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/lib/*:\ > [FedoraHome]/fedora/solr_dir/contrib/extraction/lib/*: \ > org.apache.xalan.xslt.Process \ > -PARAM FEDORASOAP 'http://localhost:8080/fedora/services' \ > -PARAM REPOSITORYNAME 'SomeName' \ > -PARAM FEDORAUSER 'fedoraAdmin' \ > -PARAM FEDORAPASS 'SomePassword' \ > -PARAM TRUSTSTOREPATH '[FedoraHome]/fedora/server/truststore' \ > -PARAM TRUSTSTOREPASS 'SomePassword' \ > -in [FileIn.xml] \ > -xsl foxmlToSolr.xslt \ > -out [FileOut.xml] > > All managed content is getting into the FileOut.xml including PDF as a > text. Here is excerpt: > <field name="dc.title">Pdf docum</field> > <field name="mods.title">Pdf docum</field> > <field name="dsm.OBJ">extracted content</field> > > > Another question. Now I am trying to get the fields into Solr Index. > All fields except mods.* are going there. My steps: > > (1) Edit foxmlToSolr.xslt so that I am getting all metadata fields I > need in the output (confirmed using command line method above). > > (2) Edit schema.xml for Solr adding statements like here: > <copyField source="mods.title" dest="mods.title_s" /> > <field name="mods.title" type="text" indexed="true" stored="false" > multiValued="true"/> > <field name="mods.title_s" type="string" maxChars="300" indexed="true" > stored="true"/> > > After this I stopped Tomcat, deleted index, started Tomcat, updated > index using Fedora GSearch web admin. > > No MODS fields in the created index (I looked up with Luke)? I have > all other fields created OK, like dc.*, dsm.OCR and others. > > Do I need to edit other files except two above? Any suggestions would help. > > Thanks, > Serhiy > > > > On Mon, Nov 21, 2011 at 3:48 AM, Gert Schmeltz Pedersen > <g...@dtic.dtu.dk> wrote: >> Hi Serhiy, >> >> I think that you are missing >> dk.defxws.fedoragsearch.server.GenericOperationsImpl >> and related classes from the classpath, when you run from command line. Let >> me know how it goes. >> >> -Gert >> >> >> On 21/11/2011, at 10.04, Serhiy Polyakov wrote: >> >>> At first I did not pass parameters to the exts:getDatastreamText >>> I did it now. Still no OCR text content if OUT.txt fields. >>> >>> Serhiy >>> >>> >>> On Mon, Nov 21, 2011 at 2:27 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >>>> Hello, >>>> >>>> I want to use command line to process exported Fedora object using >>>> foxmlToSolr.xslt stylesheet. I need to see the resulting document that >>>> will be used by solr/conf/schema.xml to create index. >>>> >>>> Object's Foxml includes inline DC datastream and managed (external) >>>> OCR datastream that contains text/plain. Foxml includes reference to >>>> OCR datastream on the local server like >>>> http://localhost:8080/fedora/get/... I pointed browser to the OCR >>>> datastream reference and I see the text there. My FedoraGSearch >>>> indexed DC and OCR alright as a part of regular workflow so >>>> foxmlToSolr.xslt must be correct. >>>> >>>> However I need to do transformation from command line for the >>>> analysts. I downloaded Xalan and run: >>>> >>>> java -cp dk/defxws/fedoragsearch/server:path/to/xalan/*: >>>> org.apache.xalan.xslt.Process -in <SOURCE.xml> -xsl foxmlToSolr.xslt >>>> -out <OUT.txt> >>>> >>>> Here is excerpt from OUT.txt >>>> <field name=”dc.title”>My Title</field> >>>> <field name=”dsm.OCR”/> >>>> >>>> So it is not grabbing managed content (OCR in my case). >>>> >>>> foxmlToSolr.xslt includes external function definition and I believe >>>> is using it for managed content: >>>> ====== >>>> … >>>> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl" >>>> … >>>> xsl:value-of select="exts:getDatastreamText($PID, $REPOSITORYNAME, >>>> @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, >>>> $TRUSTSTOREPASS)"/> >>>> … >>>> ===== >>>> >>>> Could somebody suggest me if this is at all possible to get managed >>>> content into the output when I am doing command line processing. >>>> Again, managed content is getting to the index as part of regular >>>> FedoraGSearch workflow with the same foxmlToSolr.xslt. >>>> >>>> Thanks, >>>> Serhiy >>>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-novd2d >>> _______________________________________________ >>> Fedora-commons-users mailing list >>> Fedora-commons-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure >> contains a definitive record of customers, application performance, >> security threats, fraudulent activity, and more. Splunk takes this >> data and makes sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunk-novd2d >> _______________________________________________ >> Fedora-commons-users mailing list >> Fedora-commons-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >> > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users