[Dspace-tech] Bulk Importer, OAI and dc

2010-02-12 Thread RENTON Scott
This is complicated (but thanks, Ricardo B- my thumbnails now work!), so I will 
try and be concise.

I'm using Dspace for a harvestable collection of musical instruments. The 
information is already catalogued, and I have successfully used the bulk 
importer to get the info into Dspace, in dc format.

The problem I have is that the project's main client will not harvest using dc, 
but a schema called 'lido'. Their requests will have this as the prefix, so I 
need to store data in this schema. They do concede though, that they must also 
make data available in dc for other clients.

Questions are:
1) Can I use bulk importer for a schema other than dc, and if so how? If I can 
do this, I will just import data to a dc field AND a corresponding lido field. 
It's not very clean, but it gives them what they need.
2) The cleaner solution would be to do some kind of mapping in the oai webapp 
to copy the dc fields to corresponding lido fields. Is that possible?!

Just wondering if this is an unusual situation; if anyone has come across it 
before, and can suggest a solution, that would be brilliant.

Thanks
Scott

Scott Renton
MIMS Project Officer
Digital Library Development
University Of Edinburgh
2 Buccleuch Place
Edinburgh
EH8 9LW

0131 651 5219

scott.ren...@ed.ac.uk 
-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Bulk Importer, OAI and dc

2010-02-12 Thread Peter Dietz
I'm not completely familiar with this, but to me it sounds like you have to
do a metadata crosswalk.
http://wiki.dspace.org/index.php/CrosswalkPlugins

Again, out of my area of knowledge, but looking at
[dspace-src]/dspace-oai/dspace-oai-api/src/main/java/org/dspace/app/oai/DIDLCrosswalk.java

You can see a line of code: boolean description =
allDC[i].element.equals(description);
In which the stored metadata [dc] is being output in [DIDL]

Hopefully someone with more knowledge and experience knows more on the
subject, but I think thats a start.

On Fri, Feb 12, 2010 at 9:01 AM, RENTON Scott scott.ren...@ed.ac.uk wrote:

 This is complicated (but thanks, Ricardo B- my thumbnails now work!), so I
 will try and be concise.

 I'm using Dspace for a harvestable collection of musical instruments. The
 information is already catalogued, and I have successfully used the bulk
 importer to get the info into Dspace, in dc format.

 The problem I have is that the project's main client will not harvest using
 dc, but a schema called 'lido'. Their requests will have this as the prefix,
 so I need to store data in this schema. They do concede though, that they
 must also make data available in dc for other clients.

 Questions are:
 1) Can I use bulk importer for a schema other than dc, and if so how? If I
 can do this, I will just import data to a dc field AND a corresponding lido
 field. It's not very clean, but it gives them what they need.
 2) The cleaner solution would be to do some kind of mapping in the oai
 webapp to copy the dc fields to corresponding lido fields. Is that
 possible?!

 Just wondering if this is an unusual situation; if anyone has come across
 it before, and can suggest a solution, that would be brilliant.

 Thanks
 Scott

 Scott Renton
 MIMS Project Officer
 Digital Library Development
 University Of Edinburgh
 2 Buccleuch Place
 Edinburgh
 EH8 9LW

 0131 651 5219

 scott.ren...@ed.ac.uk
 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.



 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




-- 
Peter Dietz
Systems Developer/Engineer
Ohio State University Libraries
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Display of item..

2010-02-12 Thread Fabien COMBERNOUS
Bram Luyten wrote:
 Hello Fatima,

 depending on the version of DSpace you are using, there is one simple 
 trick that might help you:

 Look in your dspace.cfg file for the parameter webui.strengths.show
 If this one is on false, try to put it on true and restart your tomcat.

 Normally, this will show the number if items in your communities and 
 collections browse, like in the screenshot attached.

It's strange, i have a 1.6.0 rc2 running dspace. I switched to true 
webui.strengths.show and i don't have the number of item.

-- 
*Fabien COMBERNOUS*
/unix system engineer/
www.kezia.com http://www.kezia.com/
*Tel: +33 (0) 467 992 986*
Kezia Group

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Bulk Importer, OAI and dc

2010-02-12 Thread Dorothea Salo
On Fri, Feb 12, 2010 at 8:01 AM, RENTON Scott scott.ren...@ed.ac.uk wrote:

 Questions are:
 1) Can I use bulk importer for a schema other than dc, and if so how? If I 
 can do this, I will just import data to a dc field AND a corresponding lido 
 field. It's not very clean, but it gives them what they need.

Yes, you can, providing that the schema is no more complex than
key-value pairs will accommodate (that is, no hierarchy as in most
XML). In that case, you first need to set up the lido fields in
DSpace's metadata registry (see
http://wiki.dspace.org/index.php/Add_a_new_metadata_field); you'll
presumably use the schema name lido. Then you make a batch import
package with the DC metadata in the usual dublin_core.xml file, and
the lido metadata in a separate file named metadata_lido.xml with
the root element dublin_core schema=lido. Then you batch-import as
normal.

I can send you a sample ETD import package out-of-band if that will
help; it's easier to figure this out from an example than to explain
it!

 2) The cleaner solution would be to do some kind of mapping in the oai webapp 
 to copy the dc fields to corresponding lido fields. Is that possible?!

See Peter Dietz's email for this.

 Just wondering if this is an unusual situation; if anyone has come across it 
 before, and can suggest a solution, that would be brilliant.

No, not unusual at all. Hope this helps.

Dorothea

-- 
Dorothea Salods...@library.wisc.edu
Digital Repository Librarian  AIM: mindsatuw
University of Wisconsin
Rm 218, Memorial Library
(608) 262-5493

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Enable Page View and Download statistics

2010-02-12 Thread Tim Donohue
Hi,

'report.public = true' enables the old style DSpace reports (available 
from the /statistics URL off the homepage or Admin screens).  These 
reports give very basic statistical information site-wide.  However, 
these statistical reports do *not* provide download statistics, and also 
do not filter out hits from web spiders (like Google, etc.).  These 
older reports also do *not* provide community, collection or item level 
statistics -- they only provide site-wide statistics.

It sounds like you are looking for community, collection or item 
statistical reports.  This functionality is coming in DSpace 1.6.0 with 
the new Statistics system being released.  Stuart Lewis (the 1.6.0 
Release Coordinator) has written up a good description of the new 1.6.0 
Statistics on his blog:
http://blog.stuartlewis.com/2010/02/10/dspace-1-6-what-will-be-in-it-for-me/

Currently, a 1.6.0 RC2 (release candidate 2) version is available for 
download.  We expect the final, stable version of 1.6.0 to be released 
the first week of March.

- Tim

On 2/12/2010 1:00 AM, Vinsenso wrote:

 hi all,

 how to enable page view statistics and download statistic for item in simple
 / full item record...??

 I have set report.public = true in dspace.cfg, but there is no changes in my
 DSpace...
 what is the function of report.public??

 Thank's for help...^^

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Display of item..

2010-02-12 Thread Fabien COMBERNOUS
Fabien COMBERNOUS wrote:
 Bram Luyten wrote:
   
 Hello Fatima,

 depending on the version of DSpace you are using, there is one simple 
 trick that might help you:

 Look in your dspace.cfg file for the parameter webui.strengths.show
 If this one is on false, try to put it on true and restart your tomcat.

 Normally, this will show the number if items in your communities and 
 collections browse, like in the screenshot attached.
 

 It's strange, i have a 1.6.0 rc2 running dspace. I switched to true 
 webui.strengths.show and i don't have the number of item.

   
In dspace.cfg it is explained :
This configuration is not used by XMLUI

I'm using xmlui.


-- 
*Fabien COMBERNOUS*
/unix system engineer/
www.kezia.com http://www.kezia.com/
*Tel: +33 (0) 467 992 986*
Kezia Group

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Display of item..

2010-02-12 Thread Claudia Juergen
Hello Fabien,

the
webui.strenght.show = true
is only half of the configuration

Take a look at the complete configuration section, see below.

You either got to set
webui.strengths.cache = true
or leave it false and run
[dspace]/bin/itemcounter

Note the caching will not scale with larger repositories, so these usually
set it to false and add the itemcounter to the lists of cron jobs.

Hope that helps

Claudia Jürgen


# Settings for content count/strength information 

# whether to display collection and community strengths
# (This configuration is not used by XMLUI.  To show strengths in the
#  XMLUI, you just need to create a theme which displays them)
webui.strengths.show = false

# if showing the strengths, should they be counted in real time or
# fetched from cache?  NOTE: To improve scaling/performance,
# the XMLUI only makes strengths available to themes if they are CACHED!
#
# Counts fetched in real time will perform an actual count of the
# database contents every time a page with this feature is requested,
# which will not scale.  If the below setting is to use the cache, you
# must run the following command periodically to update the count:
#
# [dspace]/bin/itemcounter
#
# The default is to count in real time
#
webui.strengths.cache = false


 Bram Luyten wrote:
 Hello Fatima,

 depending on the version of DSpace you are using, there is one simple
 trick that might help you:

 Look in your dspace.cfg file for the parameter webui.strengths.show
 If this one is on false, try to put it on true and restart your tomcat.

 Normally, this will show the number if items in your communities and
 collections browse, like in the screenshot attached.

 It's strange, i have a 1.6.0 rc2 running dspace. I switched to true
 webui.strengths.show and i don't have the number of item.

 --
 *Fabien COMBERNOUS*
 /unix system engineer/
 www.kezia.com http://www.kezia.com/
 *Tel: +33 (0) 467 992 986*
 Kezia Group

 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



-- 
Claudia Jürgen
Eldorado - Repositorium der TU Dortmund

Universitätsbibliothek Dortmund
Vogeplothsweg 76
D-44227 Dortmund
Tel.: 0049-231-755-4043




--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1

2010-02-12 Thread Richard, Joel M
I'll put my two cents in here...

I haven't dug deep into ways of making these queries work better than they 
already do, but my experience tells me that there has to be a faster way of 
doing these operations. Sequence scans (i.e., Seq Scan on bi_2_dmap) are 
extremely costly on large tables and these are getting on to being large (186k 
rows.)

And I don't mean to second-guess any of the DSpace developers, but I've worked 
on databases with hundreds of thousands (even millions) of records and never 
experienced anything this bad for regularly-used queries. I know we're in a 
heterogeneous environment and we don't know what kind of server Susan has or 
how it's tuned, but any query with a cost of 300,000 is insanely high. There's 
got to be a better way. :)

Personally, I have never used the EXCEPT clause and so I am not familiar with 
it, but it does seem, Susan, that you found a faster query, and from a first 
glance it looks like it would achieve the same results. It's faster because 
you've found a query that does an Index Scan which is always tons faster than 
a Sequence Scan. But you need to satisfy yourself as to whether or not the 
same rows are being deleted.

Finally, PgSQL's cost-based analyzer depends on the statistics of the table 
being accurate, so it's best to make sure that your regular VACUUM operations 
are also using the ANALYZE clause.

--Joel

Joel Richard
IT Specialist, Web Services Department
Smithsonian Institution Libraries | http://www.sil.si.edu/
(202) 633-1706 | (202) 786-2861 (f) | richar...@si.edu




From: Graham Triggs grahamtri...@gmail.com
Date: Wed, 10 Feb 2010 08:08:44 -0500
To: Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] 
susan.m.thorn...@nasa.gov
Cc: dspace-tech@lists.sourceforge.net dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] Slw submission process in DSpace 1.5.1

Hi Sue,

Hmmm.. Double Hmmm.

Bear in mind, that I personally was tuning queries in Oracle, and the tuning 
was done two years ago. I'm sure I traced the 'EXISTS' form of the query, and 
the 'IN... MINUS/EXCEPT' form had proven to be more efficient.

Looking at it now, I can't see that the current form is better. That said, in a 
tuned database environment, whilst the execution plan is significantly shorter 
for the EXISTS query, the actual difference in efficiency appears to be quite 
small - possibly negligible.

It gets interesting when you take it outside that environment though. As you 
indicate, the behaviour of the 'EXISTS' query is better in pre-8.4 Postgres (I 
don't have one to hand to run the tests myself, but the explain plan you attach 
shows better behaviour). Even in Postgres 8.4, with a large work_mem, the 
'EXISTS' query appears slightly better than the existing 'EXCEPT'. (Total 
runtimes of 279ms vs 394ms - although runtime isn't a good measure, as it 
includes the overhead of the EXPLAIN itself. The difference in execution time 
may be related to the length of the execution plan, not the efficiency of the 
operations).


NB: Note that in 8.4, the EXISTS is using a hash-join, which is more efficient 
than your query plan:

Hash Anti Join  (cost=4039.00..8265.50 rows=1 width=6) (actual 
time=279.345..279.345 rows=0 loops=1)
  Hash Cond: (bi_2_dis.id http://bi_2_dis.id  = bi_2_dmap.distinct_id)
  -  Seq Scan on bi_2_dis  (cost=0.00..2164.00 rows=15 width=10) (actual 
time=0.016..55.114 rows=15 loops=1)
  -  Hash  (cost=2164.00..2164.00 rows=15 width=4) (actual 
time=118.596..118.596 rows=15 loops=1)
-  Seq Scan on bi_2_dmap  (cost=0.00..2164.00 rows=15 width=4) 
(actual time=0.009..52.910 rows=15 loops=1)
Total runtime: 279.992 ms

In the same environment, if I drop the work_mem to 64kB, then the EXISTS query 
doubles in execution time. Although it's still attempting to do an efficient 
join - and so doesn't degrade as badly as the EXCEPT query in poorly tuned 
environments - it's still worse than the existing query operating in a 
correctly tuned environment.

On balance, for non-8.4 users, or where the repository size may require a huge 
work_mem for an efficient EXCEPT, the benefits of the EXISTS query make it 
worthwhile to replace the existing queries. I've attached a patch that replaces 
the queries in the BrowseCreateDAOs.

Although you still stand to gain some efficiency from an update to Postgres 
8.4, and tuning of the database parameters may also yield some improvements.

Regards,
G

On 9 February 2010 23:51, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL 
SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote:
Hi Graham,
 I have never used EXCEPT in a SQL query, so I had to look up what it 
did...:)  I tried a different query and, if it truly would return the same 
results as the original query, it looks like it would be even more efficient.  
Take a look:

This is our Explain for the original query:

DELETE FROM bi_2_dis WHERE id IN (SELECT id FROM bi_2_dis EXCEPT 

[Dspace-tech] Restore the dumped data

2010-02-12 Thread Zico
Hi, i want to restore my dumped data into a new dspace server. Generally, i
used to dump my data with phppgadmin; but here, i cannot see any import
button to restore my dumped data ( sql file). Here, i want to add that, my
dspace, postgresql versions exactly the same in new dspace server. How can i
export my dumped data now?

-- 
Best,
Zico
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1

2010-02-12 Thread Graham Triggs
Hi Joel,

On 12 February 2010 17:04, Richard, Joel M richar...@si.edu wrote:

 I'll put my two cents in here...

 I haven't dug deep into ways of making these queries work better than they
 already do, but my experience tells me that there has to be a faster way of
 doing these operations. Sequence scans (i.e., Seq Scan on bi_2_dmap) are
 extremely costly on large tables and these are getting on to being large
 (186k rows.)


 And I don't mean to second-guess any of the DSpace developers, but I've
 worked on databases with hundreds of thousands (even millions) of records
 and never experienced anything this bad for regularly-used queries. I know
 we're in a heterogeneous environment and we don't know what kind of server
 Susan has or how it's tuned, but any query with a cost of 300,000 is
 insanely high. There's got to be a better way. :)


It's a bit of an exaggeration to say that this is a regularly used query. It
will only be used when:

a) Installing a new item
b) Withdrawing an item
c) Reinstating an item
d) Updating the metadata of an item that has at some point been installed

It's not called during the submission / workflow process. It's not called on
any general user access. In fact, the very reason for it's existence is to
remove the need to use a costly DISTINCT when displaying browse pages to
general users - which is a much more common operation.

Personally, I have never used the EXCEPT clause and so I am not familiar
 with it, but it does seem, Susan, that you found a faster query, and from a
 first glance it looks like it would achieve the same results. It's faster
 because you've found a query that does an Index Scan which is always tons
 faster than a Sequence Scan. But you need to satisfy yourself as to
 whether or not the same rows are being deleted.


The way I'm reading it, that's a bit of a jump ahead. In Sue's query, yes
there is an index scan being used, but as part of a NOT filter subplan.
There is still a sequence scan on the bi_2_dis table, and so it's doing a
separate index scan for each row returned by bi_2_dis - that's a loop of
around 7 index scans.

In the EXCEPT version, it is only doing 3 sequence scans - none of the
operations are looped. However, it is the sort that it uses to implement the
EXCEPT in this case that sucks the performance / scalability out of the
query.

If you look at the query in a Postgres 8.4 environment that has been given
sufficient work_mem, then it still only does 3 scans. But the results are
hash joined, not sorted - and the resulting execution is better than the
EXISTS query in 8.2 (with it's 7 index scans).

Now take a look at the same EXISTS query running in 8.4 - it doesn't use
index scans at all. In this case, it's gone to using just two sequence
scans, and hash joining the results. Making it the most efficient execution
of all (but not by much as two of the sequence scans in the hash joined
EXCEPTS are on the same table and can benefit from caching). But
importantly, the EXISTS degrades more gracefully when the work_mem is
insufficient.

Regards,
G
--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech