[Dspace-tech] Bulk Importer, OAI and dc
This is complicated (but thanks, Ricardo B- my thumbnails now work!), so I will try and be concise. I'm using Dspace for a harvestable collection of musical instruments. The information is already catalogued, and I have successfully used the bulk importer to get the info into Dspace, in dc format. The problem I have is that the project's main client will not harvest using dc, but a schema called 'lido'. Their requests will have this as the prefix, so I need to store data in this schema. They do concede though, that they must also make data available in dc for other clients. Questions are: 1) Can I use bulk importer for a schema other than dc, and if so how? If I can do this, I will just import data to a dc field AND a corresponding lido field. It's not very clean, but it gives them what they need. 2) The cleaner solution would be to do some kind of mapping in the oai webapp to copy the dc fields to corresponding lido fields. Is that possible?! Just wondering if this is an unusual situation; if anyone has come across it before, and can suggest a solution, that would be brilliant. Thanks Scott Scott Renton MIMS Project Officer Digital Library Development University Of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW 0131 651 5219 scott.ren...@ed.ac.uk -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Bulk Importer, OAI and dc
I'm not completely familiar with this, but to me it sounds like you have to do a metadata crosswalk. http://wiki.dspace.org/index.php/CrosswalkPlugins Again, out of my area of knowledge, but looking at [dspace-src]/dspace-oai/dspace-oai-api/src/main/java/org/dspace/app/oai/DIDLCrosswalk.java You can see a line of code: boolean description = allDC[i].element.equals(description); In which the stored metadata [dc] is being output in [DIDL] Hopefully someone with more knowledge and experience knows more on the subject, but I think thats a start. On Fri, Feb 12, 2010 at 9:01 AM, RENTON Scott scott.ren...@ed.ac.uk wrote: This is complicated (but thanks, Ricardo B- my thumbnails now work!), so I will try and be concise. I'm using Dspace for a harvestable collection of musical instruments. The information is already catalogued, and I have successfully used the bulk importer to get the info into Dspace, in dc format. The problem I have is that the project's main client will not harvest using dc, but a schema called 'lido'. Their requests will have this as the prefix, so I need to store data in this schema. They do concede though, that they must also make data available in dc for other clients. Questions are: 1) Can I use bulk importer for a schema other than dc, and if so how? If I can do this, I will just import data to a dc field AND a corresponding lido field. It's not very clean, but it gives them what they need. 2) The cleaner solution would be to do some kind of mapping in the oai webapp to copy the dc fields to corresponding lido fields. Is that possible?! Just wondering if this is an unusual situation; if anyone has come across it before, and can suggest a solution, that would be brilliant. Thanks Scott Scott Renton MIMS Project Officer Digital Library Development University Of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW 0131 651 5219 scott.ren...@ed.ac.uk -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Peter Dietz Systems Developer/Engineer Ohio State University Libraries -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Display of item..
Bram Luyten wrote: Hello Fatima, depending on the version of DSpace you are using, there is one simple trick that might help you: Look in your dspace.cfg file for the parameter webui.strengths.show If this one is on false, try to put it on true and restart your tomcat. Normally, this will show the number if items in your communities and collections browse, like in the screenshot attached. It's strange, i have a 1.6.0 rc2 running dspace. I switched to true webui.strengths.show and i don't have the number of item. -- *Fabien COMBERNOUS* /unix system engineer/ www.kezia.com http://www.kezia.com/ *Tel: +33 (0) 467 992 986* Kezia Group -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Bulk Importer, OAI and dc
On Fri, Feb 12, 2010 at 8:01 AM, RENTON Scott scott.ren...@ed.ac.uk wrote: Questions are: 1) Can I use bulk importer for a schema other than dc, and if so how? If I can do this, I will just import data to a dc field AND a corresponding lido field. It's not very clean, but it gives them what they need. Yes, you can, providing that the schema is no more complex than key-value pairs will accommodate (that is, no hierarchy as in most XML). In that case, you first need to set up the lido fields in DSpace's metadata registry (see http://wiki.dspace.org/index.php/Add_a_new_metadata_field); you'll presumably use the schema name lido. Then you make a batch import package with the DC metadata in the usual dublin_core.xml file, and the lido metadata in a separate file named metadata_lido.xml with the root element dublin_core schema=lido. Then you batch-import as normal. I can send you a sample ETD import package out-of-band if that will help; it's easier to figure this out from an example than to explain it! 2) The cleaner solution would be to do some kind of mapping in the oai webapp to copy the dc fields to corresponding lido fields. Is that possible?! See Peter Dietz's email for this. Just wondering if this is an unusual situation; if anyone has come across it before, and can suggest a solution, that would be brilliant. No, not unusual at all. Hope this helps. Dorothea -- Dorothea Salods...@library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Enable Page View and Download statistics
Hi, 'report.public = true' enables the old style DSpace reports (available from the /statistics URL off the homepage or Admin screens). These reports give very basic statistical information site-wide. However, these statistical reports do *not* provide download statistics, and also do not filter out hits from web spiders (like Google, etc.). These older reports also do *not* provide community, collection or item level statistics -- they only provide site-wide statistics. It sounds like you are looking for community, collection or item statistical reports. This functionality is coming in DSpace 1.6.0 with the new Statistics system being released. Stuart Lewis (the 1.6.0 Release Coordinator) has written up a good description of the new 1.6.0 Statistics on his blog: http://blog.stuartlewis.com/2010/02/10/dspace-1-6-what-will-be-in-it-for-me/ Currently, a 1.6.0 RC2 (release candidate 2) version is available for download. We expect the final, stable version of 1.6.0 to be released the first week of March. - Tim On 2/12/2010 1:00 AM, Vinsenso wrote: hi all, how to enable page view statistics and download statistic for item in simple / full item record...?? I have set report.public = true in dspace.cfg, but there is no changes in my DSpace... what is the function of report.public?? Thank's for help...^^ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Display of item..
Fabien COMBERNOUS wrote: Bram Luyten wrote: Hello Fatima, depending on the version of DSpace you are using, there is one simple trick that might help you: Look in your dspace.cfg file for the parameter webui.strengths.show If this one is on false, try to put it on true and restart your tomcat. Normally, this will show the number if items in your communities and collections browse, like in the screenshot attached. It's strange, i have a 1.6.0 rc2 running dspace. I switched to true webui.strengths.show and i don't have the number of item. In dspace.cfg it is explained : This configuration is not used by XMLUI I'm using xmlui. -- *Fabien COMBERNOUS* /unix system engineer/ www.kezia.com http://www.kezia.com/ *Tel: +33 (0) 467 992 986* Kezia Group -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Display of item..
Hello Fabien, the webui.strenght.show = true is only half of the configuration Take a look at the complete configuration section, see below. You either got to set webui.strengths.cache = true or leave it false and run [dspace]/bin/itemcounter Note the caching will not scale with larger repositories, so these usually set it to false and add the itemcounter to the lists of cron jobs. Hope that helps Claudia Jürgen # Settings for content count/strength information # whether to display collection and community strengths # (This configuration is not used by XMLUI. To show strengths in the # XMLUI, you just need to create a theme which displays them) webui.strengths.show = false # if showing the strengths, should they be counted in real time or # fetched from cache? NOTE: To improve scaling/performance, # the XMLUI only makes strengths available to themes if they are CACHED! # # Counts fetched in real time will perform an actual count of the # database contents every time a page with this feature is requested, # which will not scale. If the below setting is to use the cache, you # must run the following command periodically to update the count: # # [dspace]/bin/itemcounter # # The default is to count in real time # webui.strengths.cache = false Bram Luyten wrote: Hello Fatima, depending on the version of DSpace you are using, there is one simple trick that might help you: Look in your dspace.cfg file for the parameter webui.strengths.show If this one is on false, try to put it on true and restart your tomcat. Normally, this will show the number if items in your communities and collections browse, like in the screenshot attached. It's strange, i have a 1.6.0 rc2 running dspace. I switched to true webui.strengths.show and i don't have the number of item. -- *Fabien COMBERNOUS* /unix system engineer/ www.kezia.com http://www.kezia.com/ *Tel: +33 (0) 467 992 986* Kezia Group -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Jürgen Eldorado - Repositorium der TU Dortmund Universitätsbibliothek Dortmund Vogeplothsweg 76 D-44227 Dortmund Tel.: 0049-231-755-4043 -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1
I'll put my two cents in here... I haven't dug deep into ways of making these queries work better than they already do, but my experience tells me that there has to be a faster way of doing these operations. Sequence scans (i.e., Seq Scan on bi_2_dmap) are extremely costly on large tables and these are getting on to being large (186k rows.) And I don't mean to second-guess any of the DSpace developers, but I've worked on databases with hundreds of thousands (even millions) of records and never experienced anything this bad for regularly-used queries. I know we're in a heterogeneous environment and we don't know what kind of server Susan has or how it's tuned, but any query with a cost of 300,000 is insanely high. There's got to be a better way. :) Personally, I have never used the EXCEPT clause and so I am not familiar with it, but it does seem, Susan, that you found a faster query, and from a first glance it looks like it would achieve the same results. It's faster because you've found a query that does an Index Scan which is always tons faster than a Sequence Scan. But you need to satisfy yourself as to whether or not the same rows are being deleted. Finally, PgSQL's cost-based analyzer depends on the statistics of the table being accurate, so it's best to make sure that your regular VACUUM operations are also using the ANALYZE clause. --Joel Joel Richard IT Specialist, Web Services Department Smithsonian Institution Libraries | http://www.sil.si.edu/ (202) 633-1706 | (202) 786-2861 (f) | richar...@si.edu From: Graham Triggs grahamtri...@gmail.com Date: Wed, 10 Feb 2010 08:08:44 -0500 To: Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov Cc: dspace-tech@lists.sourceforge.net dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Slw submission process in DSpace 1.5.1 Hi Sue, Hmmm.. Double Hmmm. Bear in mind, that I personally was tuning queries in Oracle, and the tuning was done two years ago. I'm sure I traced the 'EXISTS' form of the query, and the 'IN... MINUS/EXCEPT' form had proven to be more efficient. Looking at it now, I can't see that the current form is better. That said, in a tuned database environment, whilst the execution plan is significantly shorter for the EXISTS query, the actual difference in efficiency appears to be quite small - possibly negligible. It gets interesting when you take it outside that environment though. As you indicate, the behaviour of the 'EXISTS' query is better in pre-8.4 Postgres (I don't have one to hand to run the tests myself, but the explain plan you attach shows better behaviour). Even in Postgres 8.4, with a large work_mem, the 'EXISTS' query appears slightly better than the existing 'EXCEPT'. (Total runtimes of 279ms vs 394ms - although runtime isn't a good measure, as it includes the overhead of the EXPLAIN itself. The difference in execution time may be related to the length of the execution plan, not the efficiency of the operations). NB: Note that in 8.4, the EXISTS is using a hash-join, which is more efficient than your query plan: Hash Anti Join (cost=4039.00..8265.50 rows=1 width=6) (actual time=279.345..279.345 rows=0 loops=1) Hash Cond: (bi_2_dis.id http://bi_2_dis.id = bi_2_dmap.distinct_id) - Seq Scan on bi_2_dis (cost=0.00..2164.00 rows=15 width=10) (actual time=0.016..55.114 rows=15 loops=1) - Hash (cost=2164.00..2164.00 rows=15 width=4) (actual time=118.596..118.596 rows=15 loops=1) - Seq Scan on bi_2_dmap (cost=0.00..2164.00 rows=15 width=4) (actual time=0.009..52.910 rows=15 loops=1) Total runtime: 279.992 ms In the same environment, if I drop the work_mem to 64kB, then the EXISTS query doubles in execution time. Although it's still attempting to do an efficient join - and so doesn't degrade as badly as the EXCEPT query in poorly tuned environments - it's still worse than the existing query operating in a correctly tuned environment. On balance, for non-8.4 users, or where the repository size may require a huge work_mem for an efficient EXCEPT, the benefits of the EXISTS query make it worthwhile to replace the existing queries. I've attached a patch that replaces the queries in the BrowseCreateDAOs. Although you still stand to gain some efficiency from an update to Postgres 8.4, and tuning of the database parameters may also yield some improvements. Regards, G On 9 February 2010 23:51, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] susan.m.thorn...@nasa.gov wrote: Hi Graham, I have never used EXCEPT in a SQL query, so I had to look up what it did...:) I tried a different query and, if it truly would return the same results as the original query, it looks like it would be even more efficient. Take a look: This is our Explain for the original query: DELETE FROM bi_2_dis WHERE id IN (SELECT id FROM bi_2_dis EXCEPT
[Dspace-tech] Restore the dumped data
Hi, i want to restore my dumped data into a new dspace server. Generally, i used to dump my data with phppgadmin; but here, i cannot see any import button to restore my dumped data ( sql file). Here, i want to add that, my dspace, postgresql versions exactly the same in new dspace server. How can i export my dumped data now? -- Best, Zico -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1
Hi Joel, On 12 February 2010 17:04, Richard, Joel M richar...@si.edu wrote: I'll put my two cents in here... I haven't dug deep into ways of making these queries work better than they already do, but my experience tells me that there has to be a faster way of doing these operations. Sequence scans (i.e., Seq Scan on bi_2_dmap) are extremely costly on large tables and these are getting on to being large (186k rows.) And I don't mean to second-guess any of the DSpace developers, but I've worked on databases with hundreds of thousands (even millions) of records and never experienced anything this bad for regularly-used queries. I know we're in a heterogeneous environment and we don't know what kind of server Susan has or how it's tuned, but any query with a cost of 300,000 is insanely high. There's got to be a better way. :) It's a bit of an exaggeration to say that this is a regularly used query. It will only be used when: a) Installing a new item b) Withdrawing an item c) Reinstating an item d) Updating the metadata of an item that has at some point been installed It's not called during the submission / workflow process. It's not called on any general user access. In fact, the very reason for it's existence is to remove the need to use a costly DISTINCT when displaying browse pages to general users - which is a much more common operation. Personally, I have never used the EXCEPT clause and so I am not familiar with it, but it does seem, Susan, that you found a faster query, and from a first glance it looks like it would achieve the same results. It's faster because you've found a query that does an Index Scan which is always tons faster than a Sequence Scan. But you need to satisfy yourself as to whether or not the same rows are being deleted. The way I'm reading it, that's a bit of a jump ahead. In Sue's query, yes there is an index scan being used, but as part of a NOT filter subplan. There is still a sequence scan on the bi_2_dis table, and so it's doing a separate index scan for each row returned by bi_2_dis - that's a loop of around 7 index scans. In the EXCEPT version, it is only doing 3 sequence scans - none of the operations are looped. However, it is the sort that it uses to implement the EXCEPT in this case that sucks the performance / scalability out of the query. If you look at the query in a Postgres 8.4 environment that has been given sufficient work_mem, then it still only does 3 scans. But the results are hash joined, not sorted - and the resulting execution is better than the EXISTS query in 8.2 (with it's 7 index scans). Now take a look at the same EXISTS query running in 8.4 - it doesn't use index scans at all. In this case, it's gone to using just two sequence scans, and hash joining the results. Making it the most efficient execution of all (but not by much as two of the sequence scans in the hash joined EXCEPTS are on the same table and can benefit from caching). But importantly, the EXISTS degrades more gracefully when the work_mem is insufficient. Regards, G -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech