No, I did not mean that you should take the line with <dynamicField name="*"… out, I just wanted to know if you had.
That is a line that I added to schema.xml, and it is because of that line that the dc fields and potentially all other fields in a doc get included in the index. It should also take care of mods fields, if you did not have your explicitly named mods fields in schema.xml, try removing such lines. I do not know about the islandora-exts, it is not in GSearch, so I cannot answer your point (1), maybe Islandora people can. But as long as the mods fields with values are in the doc generated by your indexing stylesheet, then GSearch has finished, and it is the schema.xml and the Solr server that determines what gets into the index. As to your point (2), the only differences between schema.xml in Solr 3.4 and in GSearch 2.3 are marked with comments in the one in GSearch 2.3, and that is essentially the dynamicField line. -Gert On 22/11/2011, at 14.21, Serhiy Polyakov wrote: > Gert, > > When I took out <dynamicField name="*"… it did not work at all. > > I am observing two things: > > (1) > I am not getting fields that are extracted from datastreams using > external functions (mods) or need processing by Solr tools (OBJ > (application/pdf)) > > MODS is using in my foxmlToSolr.xslt: > > islandora-exts:getXMLDatastreamASNodeList($PID, $REPOSITORYNAME, > 'MODS', $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, > $TRUSTSTOREPASS) > > It's class (ca/upei/roblib/DataStreamForXSLT.class) entry point is this > [FedoraHome]/tomcat/webapps/fedoragsearch/WEB-INF/classes > > I should let GSerch know about it somehow? > > (2) > Solr 3.4. must have some other than Solr 1.4.way to define fields. > If you look at schema.xml from Solr 3.4 ~ schema.xml from GSearch 2.3 > they do not include any DC fields for example. I am getting all of > them in my index with those schema.xml > > > Serhiy > > > On Tue, Nov 22, 2011 at 4:34 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >> I forgot to mention that I am using Solr 3.4 and Fedora GSearch 2.3. I >> think I was using wrong type of field “text”. I do not see it defined >> in schema.xml. However, I tried other types and still no result. I >> added just one mods field like this: >> >> <field name="mods.title" type="string" indexed="true" stored="true" >> multiValued="true"/> >> >> Still it is not going to the index even output of foxmlToSolr.xslt >> gives <field name="mods.title">Title 1</field> >> >> >> Serhiy >> >> >> On Tue, Nov 22, 2011 at 3:07 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >>> Gert, >>> >>> I was able to generate output from command line by using downloaded >>> Xalan and adding class paths. But I have another question below. >>> >>> So my command line is like here >>> java -Xms512m -Xmx1024m -cp \ >>> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes:\ >>> [FedoraHome]/DISTR_XALAN/xalan/*:\ >>> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/lib/*:\ >>> [FedoraHome]/fedora/solr_dir/contrib/extraction/lib/*: \ >>> org.apache.xalan.xslt.Process \ >>> -PARAM FEDORASOAP 'http://localhost:8080/fedora/services' \ >>> -PARAM REPOSITORYNAME 'SomeName' \ >>> -PARAM FEDORAUSER 'fedoraAdmin' \ >>> -PARAM FEDORAPASS 'SomePassword' \ >>> -PARAM TRUSTSTOREPATH '[FedoraHome]/fedora/server/truststore' \ >>> -PARAM TRUSTSTOREPASS 'SomePassword' \ >>> -in [FileIn.xml] \ >>> -xsl foxmlToSolr.xslt \ >>> -out [FileOut.xml] >>> >>> All managed content is getting into the FileOut.xml including PDF as a >>> text. Here is excerpt: >>> <field name="dc.title">Pdf docum</field> >>> <field name="mods.title">Pdf docum</field> >>> <field name="dsm.OBJ">extracted content</field> >>> >>> >>> Another question. Now I am trying to get the fields into Solr Index. >>> All fields except mods.* are going there. My steps: >>> >>> (1) Edit foxmlToSolr.xslt so that I am getting all metadata fields I >>> need in the output (confirmed using command line method above). >>> >>> (2) Edit schema.xml for Solr adding statements like here: >>> <copyField source="mods.title" dest="mods.title_s" /> >>> <field name="mods.title" type="text" indexed="true" stored="false" >>> multiValued="true"/> >>> <field name="mods.title_s" type="string" maxChars="300" indexed="true" >>> stored="true"/> >>> >>> After this I stopped Tomcat, deleted index, started Tomcat, updated >>> index using Fedora GSearch web admin. >>> >>> No MODS fields in the created index (I looked up with Luke)? I have >>> all other fields created OK, like dc.*, dsm.OCR and others. >>> >>> Do I need to edit other files except two above? Any suggestions would help. >>> >>> Thanks, >>> Serhiy >>> >>> >>> >>> On Mon, Nov 21, 2011 at 3:48 AM, Gert Schmeltz Pedersen >>> <g...@dtic.dtu.dk> wrote: >>>> Hi Serhiy, >>>> >>>> I think that you are missing >>>> dk.defxws.fedoragsearch.server.GenericOperationsImpl >>>> and related classes from the classpath, when you run from command line. >>>> Let me know how it goes. >>>> >>>> -Gert >>>> >>>> >>>> On 21/11/2011, at 10.04, Serhiy Polyakov wrote: >>>> >>>>> At first I did not pass parameters to the exts:getDatastreamText >>>>> I did it now. Still no OCR text content if OUT.txt fields. >>>>> >>>>> Serhiy >>>>> >>>>> >>>>> On Mon, Nov 21, 2011 at 2:27 AM, Serhiy Polyakov <sp0...@gmail.com> wrote: >>>>>> Hello, >>>>>> >>>>>> I want to use command line to process exported Fedora object using >>>>>> foxmlToSolr.xslt stylesheet. I need to see the resulting document that >>>>>> will be used by solr/conf/schema.xml to create index. >>>>>> >>>>>> Object's Foxml includes inline DC datastream and managed (external) >>>>>> OCR datastream that contains text/plain. Foxml includes reference to >>>>>> OCR datastream on the local server like >>>>>> http://localhost:8080/fedora/get/... I pointed browser to the OCR >>>>>> datastream reference and I see the text there. My FedoraGSearch >>>>>> indexed DC and OCR alright as a part of regular workflow so >>>>>> foxmlToSolr.xslt must be correct. >>>>>> >>>>>> However I need to do transformation from command line for the >>>>>> analysts. I downloaded Xalan and run: >>>>>> >>>>>> java -cp dk/defxws/fedoragsearch/server:path/to/xalan/*: >>>>>> org.apache.xalan.xslt.Process -in <SOURCE.xml> -xsl foxmlToSolr.xslt >>>>>> -out <OUT.txt> >>>>>> >>>>>> Here is excerpt from OUT.txt >>>>>> <field name=”dc.title”>My Title</field> >>>>>> <field name=”dsm.OCR”/> >>>>>> >>>>>> So it is not grabbing managed content (OCR in my case). >>>>>> >>>>>> foxmlToSolr.xslt includes external function definition and I believe >>>>>> is using it for managed content: >>>>>> ====== >>>>>> … >>>>>> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl" >>>>>> … >>>>>> xsl:value-of select="exts:getDatastreamText($PID, $REPOSITORYNAME, >>>>>> @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, >>>>>> $TRUSTSTOREPASS)"/> >>>>>> … >>>>>> ===== >>>>>> >>>>>> Could somebody suggest me if this is at all possible to get managed >>>>>> content into the output when I am doing command line processing. >>>>>> Again, managed content is getting to the index as part of regular >>>>>> FedoraGSearch workflow with the same foxmlToSolr.xslt. >>>>>> >>>>>> Thanks, >>>>>> Serhiy >>>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> All the data continuously generated in your IT infrastructure >>>>> contains a definitive record of customers, application performance, >>>>> security threats, fraudulent activity, and more. Splunk takes this >>>>> data and makes sense of it. IT sense. And common sense. >>>>> http://p.sf.net/sfu/splunk-novd2d >>>>> _______________________________________________ >>>>> Fedora-commons-users mailing list >>>>> Fedora-commons-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> All the data continuously generated in your IT infrastructure >>>> contains a definitive record of customers, application performance, >>>> security threats, fraudulent activity, and more. Splunk takes this >>>> data and makes sense of it. IT sense. And common sense. >>>> http://p.sf.net/sfu/splunk-novd2d >>>> _______________________________________________ >>>> Fedora-commons-users mailing list >>>> Fedora-commons-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users >>>> >>> >> > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users