Gert, When I took out <dynamicField name="*"… it did not work at all.
I am observing two things: (1) I am not getting fields that are extracted from datastreams using external functions (mods) or need processing by Solr tools (OBJ (application/pdf)) MODS is using in my foxmlToSolr.xslt: islandora-exts:getXMLDatastreamASNodeList($PID, $REPOSITORYNAME, 'MODS', $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS) It's class (ca/upei/roblib/DataStreamForXSLT.class) entry point is this [FedoraHome]/tomcat/webapps/fedoragsearch/WEB-INF/classes I should let GSerch know about it somehow? (2) Solr 3.4. must have some other than Solr 1.4.way to define fields. If you look at schema.xml from Solr 3.4 ~ schema.xml from GSearch 2.3 they do not include any DC fields for example. I am getting all of them in my index with those schema.xml Serhiy On Tue, Nov 22, 2011 at 4:34 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: > I forgot to mention that I am using Solr 3.4 and Fedora GSearch 2.3. I > think I was using wrong type of field “text”. I do not see it defined > in schema.xml. However, I tried other types and still no result. I > added just one mods field like this: > > <field name="mods.title" type="string" indexed="true" stored="true" > multiValued="true"/> > > Still it is not going to the index even output of foxmlToSolr.xslt > gives <field name="mods.title">Title 1</field> > > > Serhiy > > > On Tue, Nov 22, 2011 at 3:07 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >> Gert, >> >> I was able to generate output from command line by using downloaded >> Xalan and adding class paths. But I have another question below. >> >> So my command line is like here >> java -Xms512m -Xmx1024m -cp \ >> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes:\ >> [FedoraHome]/DISTR_XALAN/xalan/*:\ >> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/lib/*:\ >> [FedoraHome]/fedora/solr_dir/contrib/extraction/lib/*: \ >> org.apache.xalan.xslt.Process \ >> -PARAM FEDORASOAP 'http://localhost:8080/fedora/services' \ >> -PARAM REPOSITORYNAME 'SomeName' \ >> -PARAM FEDORAUSER 'fedoraAdmin' \ >> -PARAM FEDORAPASS 'SomePassword' \ >> -PARAM TRUSTSTOREPATH '[FedoraHome]/fedora/server/truststore' \ >> -PARAM TRUSTSTOREPASS 'SomePassword' \ >> -in [FileIn.xml] \ >> -xsl foxmlToSolr.xslt \ >> -out [FileOut.xml] >> >> All managed content is getting into the FileOut.xml including PDF as a >> text. Here is excerpt: >> <field name="dc.title">Pdf docum</field> >> <field name="mods.title">Pdf docum</field> >> <field name="dsm.OBJ">extracted content</field> >> >> >> Another question. Now I am trying to get the fields into Solr Index. >> All fields except mods.* are going there. My steps: >> >> (1) Edit foxmlToSolr.xslt so that I am getting all metadata fields I >> need in the output (confirmed using command line method above). >> >> (2) Edit schema.xml for Solr adding statements like here: >> <copyField source="mods.title" dest="mods.title_s" /> >> <field name="mods.title" type="text" indexed="true" stored="false" >> multiValued="true"/> >> <field name="mods.title_s" type="string" maxChars="300" indexed="true" >> stored="true"/> >> >> After this I stopped Tomcat, deleted index, started Tomcat, updated >> index using Fedora GSearch web admin. >> >> No MODS fields in the created index (I looked up with Luke)? I have >> all other fields created OK, like dc.*, dsm.OCR and others. >> >> Do I need to edit other files except two above? Any suggestions would help. >> >> Thanks, >> Serhiy >> >> >> >> On Mon, Nov 21, 2011 at 3:48 AM, Gert Schmeltz Pedersen >> <g...@dtic.dtu.dk> wrote: >>> Hi Serhiy, >>> >>> I think that you are missing >>> dk.defxws.fedoragsearch.server.GenericOperationsImpl >>> and related classes from the classpath, when you run from command line. Let >>> me know how it goes. >>> >>> -Gert >>> >>> >>> On 21/11/2011, at 10.04, Serhiy Polyakov wrote: >>> >>>> At first I did not pass parameters to the exts:getDatastreamText >>>> I did it now. Still no OCR text content if OUT.txt fields. >>>> >>>> Serhiy >>>> >>>> >>>> On Mon, Nov 21, 2011 at 2:27 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >>>>> Hello, >>>>> >>>>> I want to use command line to process exported Fedora object using >>>>> foxmlToSolr.xslt stylesheet. I need to see the resulting document that >>>>> will be used by solr/conf/schema.xml to create index. >>>>> >>>>> Object's Foxml includes inline DC datastream and managed (external) >>>>> OCR datastream that contains text/plain. Foxml includes reference to >>>>> OCR datastream on the local server like >>>>> http://localhost:8080/fedora/get/... I pointed browser to the OCR >>>>> datastream reference and I see the text there. My FedoraGSearch >>>>> indexed DC and OCR alright as a part of regular workflow so >>>>> foxmlToSolr.xslt must be correct. >>>>> >>>>> However I need to do transformation from command line for the >>>>> analysts. I downloaded Xalan and run: >>>>> >>>>> java -cp dk/defxws/fedoragsearch/server:path/to/xalan/*: >>>>> org.apache.xalan.xslt.Process -in <SOURCE.xml> -xsl foxmlToSolr.xslt >>>>> -out <OUT.txt> >>>>> >>>>> Here is excerpt from OUT.txt >>>>> <field name=”dc.title”>My Title</field> >>>>> <field name=”dsm.OCR”/> >>>>> >>>>> So it is not grabbing managed content (OCR in my case). >>>>> >>>>> foxmlToSolr.xslt includes external function definition and I believe >>>>> is using it for managed content: >>>>> ====== >>>>> … >>>>> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl" >>>>> … >>>>> xsl:value-of select="exts:getDatastreamText($PID, $REPOSITORYNAME, >>>>> @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, >>>>> $TRUSTSTOREPASS)"/> >>>>> … >>>>> ===== >>>>> >>>>> Could somebody suggest me if this is at all possible to get managed >>>>> content into the output when I am doing command line processing. >>>>> Again, managed content is getting to the index as part of regular >>>>> FedoraGSearch workflow with the same foxmlToSolr.xslt. >>>>> >>>>> Thanks, >>>>> Serhiy >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> All the data continuously generated in your IT infrastructure >>>> contains a definitive record of customers, application performance, >>>> security threats, fraudulent activity, and more. Splunk takes this >>>> data and makes sense of it. IT sense. And common sense. >>>> http://p.sf.net/sfu/splunk-novd2d >>>> _______________________________________________ >>>> Fedora-commons-users mailing list >>>> Fedora-commons-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-novd2d >>> _______________________________________________ >>> Fedora-commons-users mailing list >>> Fedora-commons-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>> >> > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users