Gert,

When I took out <dynamicField name="*"… it did not work at all.

I am observing two things:

(1)
I am not getting fields that are extracted from datastreams using
external functions (mods) or need processing by Solr tools (OBJ
(application/pdf))

MODS is using in my foxmlToSolr.xslt:

islandora-exts:getXMLDatastreamASNodeList($PID, $REPOSITORYNAME,
'MODS', $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH,
$TRUSTSTOREPASS)

It's class (ca/upei/roblib/DataStreamForXSLT.class) entry point is this
[FedoraHome]/tomcat/webapps/fedoragsearch/WEB-INF/classes

I should let GSerch know about it somehow?

(2)
Solr 3.4. must have some other than Solr 1.4.way to define fields.
If you look at schema.xml from Solr 3.4 ~ schema.xml from GSearch 2.3
they do not include any DC fields for example. I am getting all of
them in my index with those schema.xml


Serhiy


On Tue, Nov 22, 2011 at 4:34 AM, Serhiy Polyakov <sp0...@gmail.com> wrote:
> I forgot to mention that I am using Solr 3.4 and Fedora GSearch 2.3. I
> think I was using wrong type of field “text”. I do not see it defined
> in schema.xml. However, I tried other types and still no result. I
> added just one mods field like this:
>
> <field name="mods.title" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
> Still it is not going to the index even output of foxmlToSolr.xslt
> gives <field name="mods.title">Title 1</field>
>
>
> Serhiy
>
>
> On Tue, Nov 22, 2011 at 3:07 AM, Serhiy Polyakov <sp0...@gmail.com> wrote:
>> Gert,
>>
>> I was able to generate output from command line by using downloaded
>> Xalan and adding class paths. But I have another question below.
>>
>> So my command line is like here
>> java -Xms512m -Xmx1024m -cp \
>> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes:\
>> [FedoraHome]/DISTR_XALAN/xalan/*:\
>> [FedoraHome]/fedora/tomcat/webapps/fedoragsearch/WEB-INF/lib/*:\
>> [FedoraHome]/fedora/solr_dir/contrib/extraction/lib/*: \
>> org.apache.xalan.xslt.Process \
>> -PARAM FEDORASOAP 'http://localhost:8080/fedora/services' \
>> -PARAM REPOSITORYNAME 'SomeName' \
>> -PARAM FEDORAUSER 'fedoraAdmin' \
>> -PARAM FEDORAPASS 'SomePassword' \
>> -PARAM TRUSTSTOREPATH '[FedoraHome]/fedora/server/truststore' \
>> -PARAM TRUSTSTOREPASS 'SomePassword' \
>> -in [FileIn.xml] \
>> -xsl foxmlToSolr.xslt \
>> -out [FileOut.xml]
>>
>> All managed content is getting into the FileOut.xml including PDF as a
>> text. Here is excerpt:
>> <field name="dc.title">Pdf docum</field>
>> <field name="mods.title">Pdf docum</field>
>> <field name="dsm.OBJ">extracted content</field>
>>
>>
>> Another question. Now I am trying to get the fields into Solr Index.
>> All fields except mods.* are going there. My steps:
>>
>> (1) Edit foxmlToSolr.xslt so that I am getting all metadata fields I
>> need in the output (confirmed using command line method above).
>>
>> (2) Edit schema.xml for Solr adding statements like here:
>> <copyField source="mods.title" dest="mods.title_s" />
>> <field name="mods.title" type="text" indexed="true" stored="false"
>> multiValued="true"/>
>> <field name="mods.title_s" type="string" maxChars="300" indexed="true"
>> stored="true"/>
>>
>> After this I stopped Tomcat, deleted index, started Tomcat, updated
>> index using Fedora GSearch web admin.
>>
>> No MODS fields in the created index (I looked up with Luke)? I have
>> all other fields created OK, like dc.*, dsm.OCR and others.
>>
>> Do I need to edit other files except two above? Any suggestions would help.
>>
>> Thanks,
>> Serhiy
>>
>>
>>
>> On Mon, Nov 21, 2011 at 3:48 AM, Gert Schmeltz Pedersen
>> <g...@dtic.dtu.dk> wrote:
>>> Hi Serhiy,
>>>
>>> I think that you are missing
>>> dk.defxws.fedoragsearch.server.GenericOperationsImpl
>>> and related classes from the classpath, when you run from command line. Let 
>>> me know how it goes.
>>>
>>> -Gert
>>>
>>>
>>> On 21/11/2011, at 10.04, Serhiy Polyakov wrote:
>>>
>>>> At first I did not pass parameters to the exts:getDatastreamText
>>>> I did it now. Still no OCR text content if OUT.txt fields.
>>>>
>>>> Serhiy
>>>>
>>>>
>>>> On Mon, Nov 21, 2011 at 2:27 AM, Serhiy Polyakov <sp0...@gmail.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I want to use command line to process exported Fedora object using
>>>>> foxmlToSolr.xslt stylesheet. I need to see the resulting document that
>>>>> will be used by solr/conf/schema.xml to create index.
>>>>>
>>>>> Object's Foxml includes inline DC datastream and managed (external)
>>>>> OCR datastream that contains text/plain. Foxml includes reference to
>>>>> OCR datastream on the local server like
>>>>> http://localhost:8080/fedora/get/... I pointed browser to the OCR
>>>>> datastream reference and I see the text there. My FedoraGSearch
>>>>> indexed DC and OCR alright as a part of regular workflow so
>>>>> foxmlToSolr.xslt must be correct.
>>>>>
>>>>> However I need to do transformation from command line for the
>>>>> analysts. I downloaded Xalan and run:
>>>>>
>>>>> java -cp dk/defxws/fedoragsearch/server:path/to/xalan/*:
>>>>> org.apache.xalan.xslt.Process -in <SOURCE.xml> -xsl foxmlToSolr.xslt
>>>>> -out <OUT.txt>
>>>>>
>>>>> Here is excerpt from OUT.txt
>>>>> <field name=”dc.title”>My Title</field>
>>>>> <field name=”dsm.OCR”/>
>>>>>
>>>>> So it is not grabbing managed content (OCR in my case).
>>>>>
>>>>> foxmlToSolr.xslt includes external function definition and I believe
>>>>> is using it for managed content:
>>>>> ======
>>>>> …
>>>>> xmlns:exts="xalan://dk.defxws.fedoragsearch.server.GenericOperationsImpl"
>>>>> …
>>>>> xsl:value-of select="exts:getDatastreamText($PID, $REPOSITORYNAME,
>>>>> @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH,
>>>>> $TRUSTSTOREPASS)"/>
>>>>> …
>>>>> =====
>>>>>
>>>>> Could somebody suggest me if this is at all possible to get managed
>>>>> content into the output when I am doing command line processing.
>>>>> Again, managed content is getting to the index as part of regular
>>>>> FedoraGSearch workflow with the same foxmlToSolr.xslt.
>>>>>
>>>>> Thanks,
>>>>> Serhiy
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure
>>>> contains a definitive record of customers, application performance,
>>>> security threats, fraudulent activity, and more. Splunk takes this
>>>> data and makes sense of it. IT sense. And common sense.
>>>> http://p.sf.net/sfu/splunk-novd2d
>>>> _______________________________________________
>>>> Fedora-commons-users mailing list
>>>> Fedora-commons-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure
>>> contains a definitive record of customers, application performance,
>>> security threats, fraudulent activity, and more. Splunk takes this
>>> data and makes sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-novd2d
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to