Hi Gert,

This is some great info! Question about one of the steps:

 > - which gets the datastream
 > from org.fcrepo.server.access.FedoraAPIA.getDatastreamDissemination() --
 > a SOAP call

Does that have to be a SOAP call, or can that be done via rest?

-nruest

On 13-10-17 11:21 AM, Gert Schmeltz Pedersen wrote:
> Hi Nick,
>
> The flow is, without details:
>
> - After ingest, Fedora sends modification message to update listeners
>
> - GSearch receives it in
> dk.defxws.fedoragsearch.server.UpdateListener.onMessage()
>
> - calls dk.defxws.fedoragsearch.server.GenericOperationsImpl.updateIndex()
>
> - gets the foxml from org.fcrepo.server.management.FedoraAPIM.export()
> -- a SOAP call
>
> - gets the Solr index document from GTransformer.transform(xslt, foxml)
>
> - wherein the xslt transformer calls
> GenericOperationsImpl.getDatastreamFromTika()
>
> - which gets the datastream
> from org.fcrepo.server.access.FedoraAPIA.getDatastreamDissemination() --
> a SOAP call
>
> - and gets the index field contents from
> TransformerToText.getFromTika(datastream)
>
> If you have used the default foxmlToSolr.xslt, you will also call
> getDatastreamDissemination() on your video streams, a waste of
> processing time, since you get no index text out of a video stream.
>
> Therefore, you should tailor your foxmlToSolr.xslt to avoid the
> datastreams containing video streams, e.g.
>
> <xsl:for-eachselect="foxml:datastream[@ID !=
> '<your-video-datastream-id>' and (@CONTROL_GROUP='M' or
> @CONTROL_GROUP='E' or @CONTROL_GROUP='R')]">
> <xsl:value-ofdisable-output-escaping="yes"select="exts:getDatastreamFromTika($PID,
> $REPOSITORYNAME, @ID, 'field', concat('ds.', @ID), concat('dsmd_', @ID,
> '.'), '', $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH,
> $TRUSTSTOREPASS)"/>
> </xsl:for-each>
>
>
> Cheers,
> Gert
>
>
>
> On 16/10/2013, at 08.26, Nick Ruest wrote:
>
>> Hi Gert (& maybe all),
>>
>> After a bunch of investigating and experimenting, I think I've made a
>> some headway. I'm now running Java 7, and the large file processing
>> seems to be running a lot smoother. But, I did notice a couple things
>> tailing the fedoragsearch.daily.log & fedora.log[1]. The datastream in
>> question is a 6.97GB video file. The server this is all running has 24G,
>> and this (-Xms18432M -Xmx18432M -XX:MaxPermSize=1024M) is my memory
>> allocation setup for firing up the entire stack.
>>
>> I'm not sure if this is a GSearch problem or fcrepo problem, or both. Is
>> GSearch making the SOAP calls? The initial ingest of the file in
>> question took place via REST. So, I'm a little confused here.
>>
>> Any insight/guidance would be very much appreciated!
>>
>> cheers!
>>
>> -nruest
>>
>> [1] https://gist.github.com/ruebot/7003346
>>
>>
>> On 13-10-07 05:20 AM, Gert Schmeltz Pedersen wrote:
>>> Hi Nick,
>>>
>>> You may see the time consumption for Fedora and for GSearch
>>> separately from the fedora.log and fedoragsearch.log, and for GSearch
>>> you may see the time for each datastream.
>>>
>>> Concerning GSearch, you may see from foxmlToSolr.xslt how tika is
>>> called for the video stream. You may index datastreams on the
>>> metadata or on the contents or both, and if your foxmlToSolr.xslt by
>>> default try to index the video contents, then you should tailor your
>>> foxmlToSolr.xslt, see the GSearch documentation page about how to
>>> call tika on datastreams.
>>>
>>> Gert
>>>
>>>
>>> On 07/10/2013, at 04.03, Nick Ruest wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Late last week I decided to test out ingesting some large files (5GB
>>>> video file) with Plupload[1][2], and while I was able to ingest just
>>>> fine through Islandora interface, I've noticed fcrepo has become
>>>> basically worthless since. I wanted to try and wait it out and see if
>>>> this is just its thing with large files I noticed a while back -- taking
>>>> forever to decided how to handle it -- but, about 4 days later, we still
>>>> have massive processes rocking[3].
>>>>
>>>> Is this expected behaviour? Is this a faux pas (dude never let fcrepo
>>>> manage a large file!)? Gsearch/Tika chugging away at the file forever?
>>>> Or, something else?
>>>>
>>>> I'm running fcrepo 3.6.2 on an Islandora stack (gsearch + solr), and
>>>> here[4] is my install.properities. Let me know if you need anymore
>>>> config info, or anything else.
>>>>
>>>> cheers!
>>>>
>>>> -nruest
>>>>
>>>> [1] https://drupal.org/project/plupload
>>>> [2] https://github.com/discoverygarden/islandora_plupload
>>>> [3] http://i.imgur.com/3ewAeSD.jpga
>>>> [4] https://gist.github.com/ruebot/01fbbec034b7331dcc94
>>>>
>>>> ------------------------------------------------------------------------------
>>>> October Webinars: Code for Performance
>>>> Free Intel webinars can help you accelerate application performance.
>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>> most from
>>>> the latest Intel processors and coprocessors. See abstracts and
>>>> register >
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Fedora-commons-users mailing list
>>>> Fedora-commons-users@lists.sourceforge.net
>>>> <mailto:Fedora-commons-users@lists.sourceforge.net>
>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>> most from
>>> the latest Intel processors and coprocessors. See abstracts and
>>> register >
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> <mailto:Fedora-commons-users@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>> most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>

-- 
-nruest

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to