I just ran a test  and timed execution time 

script  4681  items    -> 26.334u  1.829s 0:35.43 79.4%   0+0k  0+ 36io   0pf+0w
script 64065  items    -> 77.505u 16.817s 6:07.68 25.6%   0+0k  1+365io   0pf+0w
jruby+gem+start dspace -> 12.047u  0.525s 0:06.75 186.0%  0+0k 52+ 38io 393pf+0w
dspace database test   ->  6.616u  0.348s 0:03.44 202.0%  0+0k  2+ 15io   0pf+0w

comparing     the time of running a regular database test versus running a 
comparable JRuby script that loads the dspace gem and connects  to the Dspace 
instance, which involves more or less the same actions as testing the database, 
shows that this costs an extra 6sec user time and .2 sec system time. 

the second script example processes about 13 times as many items than the first 
- but the real elapsed time   6min versus 35sec more like 10 times as long; 
just starting up the ruby interpreter, loading the gem and starting the DSPace 
kernel takes takes almost 7sec which explains most of that ‘imbalance’

Monika

—
Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Phone: 609-258-4161
Skype: mo-meven



> On Sep 1, 2016, at 12:05 PM, Monika Mevenkamp <mome...@gmail.com> wrote:
> 
> does speed matter ?  Is this something you’ll have to do a lot - or is it one 
> of those one-of-scripts ?
> 
> If you run this on the command line / cron   it may not be so important - 
> especially with a cron job  you may not care that much - as log as you can 
> start it at midnight and it gets done by 7am 
> 
> Calling the JRuby script from the UI, aka calling from Java is possible - but 
> I have not actually done that yet 
> 
> I don’t believe that calling Java via JRuby adds much to the performance
> 
> A bigger issue,  I see, is that DSpace.findByMetadataValue  returns an array 
> of matching DSpaceObjects - if speed matters this needs to be changed to 
> return an iterator, which shouldn’t be too hard 
> 
> Why not just try and see - since the script only accesses data and does not 
> change anything - there is no danger to disturb your instance. Plus you can 
> run this anywhere - as long as you have access to the database. 
> 
> Monika
> 
> —
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
> 
> 
> 
>> On Sep 1, 2016, at 11:48 AM, Ilja Sidoroff <ilja.sidor...@uef.fi> wrote:
>> 
>> Thanks! That script would indeed do what I'd need, but I'm bit concerned 
>> about the scalability, since it will have to do one request per item - and 
>> if I have thousands of items, that might get a bit heavy? Or would it? I 
>> really don't know don't know how long for instance 10.000 item/id/metadata 
>> requests would take.
>> 
>> Ilja
>> 
>> ________________________________________
>> From: Monika Mevenkamp <mome...@gmail.com>
>> Sent: Thursday, September 1, 2016 6:30:33 PM
>> To: Ilja Sidoroff
>> Cc: DSpace Tech
>> Subject: Re: [dspace-tech] Querying items by metadata item via SOLR and REST
>> 
>> Hi Ilja
>> 
>> I have a script that given a metadata field, e.g. pu.workflow.state, 
>> produces a tab separated list so:
>> 
>> field   id      handle  value
>> pu.workflow.state       969     99999/fk4w099v32        approved
>> pu.workflow.state       903     null    emailed
>> pu.workflow.state       753     null    emailed
>> pu.workflow.state       752     null    emailed
>> pu.workflow.state       902     null    orphaned
>> 
>> 
>> The script is written in jruby and based on my dspace-jruby gem, see Script 
>> here<https://github.com/akinom/dspace-cli/blob/master/metadata/list_values.rb>.
>> The gem as well as the script are available from github:   jrdspace 
>> gem<https://github.com/akinom/dspace-jruby>.  and 
>> cli-dspace<https://github.com/akinom/dspace-cli> , which has a bunch of 
>> other scripts.
>> 
>> The script is quite small, its ‘action’ is in the doit method
>> 
>> def doit(metadata_field)
>> puts ['field', 'id', 'handle', 'value'].join("\t")
>> dsos = DSpace.findByMetadataValue(metadata_field, nil, DConstants::ITEM)
>> dsos.each  do  |dso|
>>   vals = dso.getMetadataByMetadataString(metadata_field).collect { |v| 
>> v.value }
>>   puts [metadata_field, dso.getID, dso.getHandle.nil? ? "null" : 
>> dso.getHandle, vals  ].join("\t")
>> end
>> end
>> 
>> if you want to try this out , there are instructions on GitHUb. If you want 
>> to work in Java, look at the implementation of the 
>> DSpace.findByMetadataValue  method. It has the SQL statement. see 
>> HERE<https://github.com/akinom/dspace-jruby/blob/master/lib/dspace/dspace.rb#L150-L171>
>> 
>> Monika
>> 
>> —
>> Monika Mevenkamp
>> Digital Repository Infrastructure Developer
>> Princeton University
>> Phone: 609-258-4161
>> Skype: mo-meven
>> 
>> 
>> 
>> On Sep 1, 2016, at 6:43 AM, Ilja Sidoroff 
>> <ilja.sidor...@uef.fi<mailto:ilja.sidor...@uef.fi>> wrote:
>> 
>> Hello,
>> 
>> I am using DSpace 5.5.
>> 
>> Am I correct, that SOLR queries return only items that are in
>> *collections* and not in the *workflow*? At least my search attemps
>> indicate that?
>> 
>> In the REST API, however, it seems that GET /items returns only
>> results that are in the collections. However, with POST
>> /items/find-by-metadata-field I can get all items in the DSpace, both
>> those in the collections and those in the workflow?
>> 
>> What I need, is a list of *all items* (both in the workflow and the
>> collections) that have certain metadata field set and *the value of
>> that field*. I don't see other way of doing that, except by direct SQL
>> query to the database. I have one for 5.x, but I'm not happy with it
>> since, I need to update it for 6.x etc. Is there any other way of
>> doing this?
>> 
>> Also, it seems that
>> 
>> dspace import -d -m mapfile ...
>> 
>> does not delete items currently in the workflow? Is this intentional or a 
>> bug?
>> 
>> regards,
>> 
>> Ilja Sidoroff
>> University of Eastern Finland
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to 
>> dspace-tech+unsubscr...@googlegroups.com<mailto:dspace-tech+unsubscr...@googlegroups.com>.
>> To post to this group, send email to 
>> dspace-tech@googlegroups.com<mailto:dspace-tech@googlegroups.com>.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to