Re: [Dspace-tech] DSpace memory issue
On Thu, 9 Feb 2012, Gabriel Dina wrote: We found in our DSpace installation (XMLUI) that JAVA uses a lot of memory for just a few items added in DSpace. Even in the JSPUI there are memory leaks. We have a nightly cronjob which restarts Tomcat to address the issue, even though we fixed several of the memory leaks. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 09/02/2012 : The Moon is Waning Gibbous (93% of Full) -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Tools for automatic creation of dublin core and contents
On Wed, 10 Aug 2011, Magnus Norberg wrote: does anyone know if there are any tools for automatic creation of dublin core files and contents files? One need these files for batch import, one for each object. But if I have like a thousand files (for example PDF files) on my harddrive that I want to import into DSpace in a batch import, I do not want to create all these Item1, Item2 and so on directories one by one, and then create dublin core and content files one by one for each object, it would take too much time... We created a tool that will do that work for you, all you need is the list of filenames and the metadata in a csv file, such as can be created by any spreadsheet program (Excel or OpenOffice, for example). It'll then create the batch import structure for you. This might be one way to help with your problem. http://tools.dspace.cam.ac.uk/metadatamapper/ Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 10/08/2011 : The Moon is Waxing Gibbous (75% of Full) -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Tools for automatic creation of dublin core and contents
On Wed, 10 Aug 2011, Hugh Paterson III wrote: Tom, your extraction Method, does it take into account that the metadata values in the PDF (or other file) might not be correct? Does it allow for writing back to the file the correct values? It doesn't seem that it does write back to the files. Not, it doesn't. It just makes it easier to generate the DSpace batch importer format. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 10/08/2011 : The Moon is Waxing Gibbous (77% of Full) -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Dspace Installation on Ubuntu
On Mon, 8 Aug 2011, bonface asiligwa wrote: I have been trying to instaation of dspace on ubuntu 11.04 but i dont succed can someone just give a step by step installation of Dspace 7.1.2 https://wiki.duraspace.org/display/DSPACE/Installing+DSpace+1.7+on+Ubuntu -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 08/08/2011 : The Moon is Waxing Gibbous (61% of Full)-- BlackBerryreg; DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos much more. Register early save! http://p.sf.net/sfu/rim-blackberry-1___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] SSL and HTTPS Question
On Mon, 1 Aug 2011, Mark H. Wood wrote: Should the rest of their session take place over an https connection or is it safe for them to go back to regular http after they have logged in? In general we can't really answer that and you probably can't either. It depends on the nature of the stuff in your repository and your users' needs for privacy. And if your repo. is public, you don't know who your users are until they've arrived. If you go back to HTTP after signing in, then anyone can eavesdrop and steal your session. If you do not want this, then you should make sure to run everything over HTTPS as soon as someone's logged in. Then the rest of their session should be encrypted. Assuming that the rest of the repository is public, you probably don't want the overhead and lack of caching of running that over HTTPS, so it's better to run it over plain HTTP until people log in. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 01/08/2011 : The Moon is Waxing Crescent (9% of Full) -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Hiding dark items
We got quite a few queries recently about how we hide dark items from the browse and OAI-PMH views. We've picked our code apart and put the changes online: http://tools.dspace.cam.ac.uk/dark_items.html We hope this will be useful for other people in the DSpace community. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 29/07/2011 : The Moon is Waning Crescent (14% of Full) -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Custom thumbnail code
DSpace@Cambridge uses custome thumbnail code for a variety of reasons: * Separating these UI components from actual archival content * Reducing database load * Having thumbnails generated on the fly rather than waiting for the media filter * Having higher-quality thumbnails than those produced by the default DSpace thumbnail system Because we got several enquiries into how we did this, we made the code and an explanation thereof available online: http://tools.dspace.cam.ac.uk/thumbnails/ We hope this will turn out to be useful for people with problems similar to those we used to have. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 17/06/2011 : The Moon is Waning Gibbous (92% of Full) -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargo and OAI interface
On Fri, 6 May 2011, Richard Rodgers wrote: The embargo system is designed to protect bitstreams, not metadata. While it certainly would be possible to alter OAI or other code to check for embargo dates, this has not been done to the best of my knowledge. I am curious why, given that the content will be inaccessible, is it desirable to hide the metadata from harvesters? I'd like to ask for a flag in the dspace config file to let dark items be properly dark (including embargoed items). This applies to search results as well as (possibly even more so) to harvesting. There are several instances where it might be necessary for metadata to be hidden: - data protection (if the metadata contains sensitive information) - commercial interest (e.g. novel discoveries waiting to be exploited) - academic (e.g. disputed works) - usability (dark items aren't available, so shouldn't show up) We've put considerable work in filtering dark items from search results (which took a lot of work, and yet was still a dirty hack) and OAI. It would be nice to see this functionality in the main code base. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 10/05/2011 : The Moon is Waxing Crescent (44% of Full) -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargo and OAI interface
On Tue, 10 May 2011, Blanco, Jose wrote: I have been working lately on hiding items from search results that have READ metadata restrictions for certain users. So for example, item1 is restricted to only one particular user, if that user is logged-in and searches for a string in that item, he will get the item in the results set, but if an anonymous user is logged in and searches for a string in that item, the item will not show in the search results. I am now trying to restrict items like this in the browsing, but am having more difficulty. It sounds like you may have something that restricts items from showing up when browsing. Is that the case? Could you share the code that does that? We do have code that does that, but it's quite an ugly hack -- it filters results from the browse pages (including search results) by checking authorization as the browse list is created. This does mess up pagination. Sadly, our developer is indisposed at the moment, and I wouldn't know where to find all the changes, so sharing it isn't really possible at the moment. Sorry. However, I do gather that to Do It Properly, changes would be needed to the actual browse system. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 10/05/2011 : The Moon is Waxing Crescent (46% of Full) -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Scalability issues report, dsp...@cambridge
On 7 Oct 2010, at 21:56, Stuart Lewis wrote: with 16GB of memory and fast local storage Java memory: -Xmx2048M -Xms2048M Is there a reason why you only allocate 1/8th of the system memory to the application? Have you found that adding extra doesn't help? In our experience, it merely delays when the error occurs, and we'd still need to restart. Whether we do this nightly or every other night doesn't make much difference. I'm not sure it would actually make it go faster. Additionally, we need to keep memory free for file caching and thumbnail generation; we found that if we assign too much memory to Java then the system needs to read from disk more for these other tasks and we get a slow-down there. - Assetstore: random structure causes large overhead on filesystem for no real gain Are you able to expand on the overhead that is caused, and from your profiling, explain how the structure could be improved? My gut (and uniformed) instinct would be that since asset store reads are completely random depending on the items being viewed at the time, the layout of directories would be irrelevant. Writes may be slightly less efficient, but since writes only tend to occur once, they are of less consequence. Apologies for sounding cryptic; I was trying not to be too verbose in the template. :-) This has mostly to do with back-ups. With about 600,000 files in random directories, it can be hard to find out what files have changed. We implemented an simple asset store structure that stores files by year/month/day. This means we can mirror new files very quickly, and only traverse the entire assetstore every other day to check if files have changed. Maybe I should expand a bit on our storage set-up: - our live system has about 90TB capacity, with an EMC SAN connected to a pair of Sun servers. These present them to our private network at about 4Gbps, as well as running the checksums (I wrote some Perl to do this job locally, rather than add to the I/O of the live server.) - we have two sets of back-up servers (ZFS-based) off-site for the live system, which use rsync to mirror all this data. (Two systems because otherwise, if we lose one, it'd be vulnerable too long while the data is re-sync'ed). A small script makes copies of the day's assetstore every hour; a complete rsync runs across assetstores (the original one as well as the new one with our own datestamp format) every alternating day, and at week-ends we run rsync with checksums. Essentially this system is copy-on-write: if a file changes on disk, the old back-up copy is moved into a holding area to be deleted when necessary, and the new file copied in its place. Finally, the date structure for the directory/file names helps locate problem files quickly if necessary. Not a huge thing, but it makes my life easier. - Search indexer: fails on large repositories, slowing down and eventually running out of memory. Do you have any percentages on the amount of page views that relate to browse, and how many relate to other views? I'm curious if browse from the front end is causing an issue too? The reason I'm asking, is that with the potential inclusion of the dspace-discovery layer in a future version, this could replace the database-driven browse system with solr. Not only will this provide a richer faceted search, but it could likely offer a good performance boost for browse-related functions. It also offers another way of scaling-out, by putting solr on a different server. This question I'll have to leave to Simon to answer, so I don't make a hash of it. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Scalability issues report, dsp...@cambridge
Dear all, I'm attaching a dump of our PostgreSQL configuration to this email. We got some input from Postgres developers into how best to tune for our needs, but if someone has suggestions for things to try then we'd be happy to hear them. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH name | setting | description -+--+--- add_missing_from| off | Automatically adds missing table references to FROM clauses. allow_system_table_mods | off | Allows modifications of the structure of system tables. archive_command | (disabled) | Sets the shell command that will be called to archive a WAL file. archive_mode| off | Allows archiving of WAL files using archive_command. archive_timeout | 0| Forces a switch to the next xlog file if a new file has not been started within N seconds. array_nulls | on | Enable input of NULL elements in arrays. authentication_timeout | 1min | Sets the maximum allowed time to complete client authentication. autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold| 50 | Minimum number of tuple inserts, updates or deletes prior to analyze. autovacuum_freeze_max_age | 2| Age at which to autovacuum a table to prevent transaction ID wraparound. autovacuum_max_workers | 3| Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_naptime | 1min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay| 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit| -1 | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum. backslash_quote | safe_encoding| Sets whether \' is allowed in string literals. bgwriter_delay | 200ms| Background writer sleep time between rounds. bgwriter_lru_maxpages | 100 | Background writer maximum number of LRU pages to flush per round. bgwriter_lru_multiplier | 2| Multiple of the average buffer usage to free per round. block_size | 8192 | Shows the size of a disk block. bonjour_name| | Sets the Bonjour broadcast service name. check_function_bodies | on | Check function bodies during CREATE FUNCTION. checkpoint_completion_target| 0.5 | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval. checkpoint_segments | 12 | Sets the maximum distance in log segments between automatic WAL checkpoints. checkpoint_timeout | 5min | Sets the maximum time between automatic WAL checkpoints. checkpoint_warning | 30s | Enables warnings if checkpoint segments are filled more frequently than this. client_encoding | UTF8 | Sets the client's character set encoding. client_min_messages | warning | Sets the message levels that are sent to the client
[Dspace-tech] Scalability issues report, dsp...@cambridge
DSpace scalability issues report, per wiki template: 1. dsp...@cambridge, The University of Cambridge, UK. Technical contacts: Tom De Mulder, td...@cam.ac.uk (systems manager) Simon Brown st...@cam.ac.uk (DSpace developer) 2. a. DSpace version 1.6.2 with extensive local patches, using JSPUI Size: 137 communities, 258 collections, 200k items, 12TB, 436k bitstreams (excluding licenses) b. PostgreSQL 8.4.4 c. Tomcat 6.0.24 standalone d. Separate servers for webapp, DB, storage and ancillary functions Webapp/DB servers are HT 8-core Intel servers running Ubuntu Linux with 16GB of memory and fast local storage Java memory: -Xmx2048M -Xms2048M 3. a. - Unless Tomcat is restarted, it will consistently fail due to lack of memory in less than 48 hours. - Batch importer: will fail on large batch imports (order of thousands of items), performance degrades with size of repository and of batch. - Search indexer: fails on large repositories, slowing down and eventually running out of memory. - Assetstore: random structure causes large overhead on filesystem for no real gain See also our poster, presented in Gothenburg: http://tools.dspace.cam.ac.uk/DSUG09%20A2%20poster.pdf b. Installed vanilla DSpace 1.6.2, imported 200k randomly generated items, ran siege against it, watched it not cope. We've done profiling in the past, but not for 1.6. However, we've not noticed significant changes in the code that has issues. c. We have patches for the indexer; batch importer; thumbnail and PDF text extraction; assetstore structure; dark item masking in OAI and browse code 4. We can't commit to volunteering unless this can be incorporated into the work we need to undertake in our primary capacity of running the University's Institutional Repository. However, we would be willing to try and make this happen. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Scalability issues report, dsp...@cambridge
(Apologies for replying to my own email.) One metric the template didn't ask for, I just noticed, is the number of hits per second. We average about 2 hits per second, which is very low, even if most of these hits are actual page views, not just layout elements. However, both our webapp and database servers are under constant load, the latter in particular. Actual load average numbers are meaningless for comparison because they depend so much on the way the OS kernel implements them, so I won't give them. Suffice to say, though, that we had to ask the people running our university search engine and similar services to throttle their index rate so the servers wouldn't get overloaded. Also of note is that the problems are mostly on the database and webapp end, there are no problems with I/O (disk or network). -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 6 Oct 2010, at 15:15, Graham Triggs wrote: [snip] This is exactly the kind of pointless pontification that we got last time. Any point that is raised is deflected or ignored, and you even manage to contradict yourself between paragraphs. What's it to be, should patches benefit ALL repositories, or is it fine if it's just some? Or the other way round, maybe? I will be very happy to offer our experiences regarding large-scale DSpace instances with the community, if that can be of any help. But not if it involves having to deal with Graham Triggs. I really do not have time for this. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 24 Sep 2010, at 21:17, bill.ander...@library.gatech.edu wrote: We've been experiencing problems similar to some reported on this thread since our upgrade to 1.6 several months ago. We're still using the jspui, and we've wondered (among other things) if some of these problems might be alleviated by a switch to the xmlui. Has anybody had any experience comparing the memory footprint and/or resource usage issues between the two interfaces? We load-tested the XMLUI (on identical hardware) and it was even worse. It ran out of memory and crashed really quickly, so we never took it into production. But your mileage may vary. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 11:38, Hilton Gibson wrote: We started with a VM which had 2GB memory. Then added 2GB to the VM, no luck. Then luckily we had funds to buy a server. So now we have 12GB RAM and 12CPU's. No crashes so far. Using the XMLUI. Does DSpace really need this and what happens when we go to one million items ?? A lot of the back-end code of DSpace, the very core of it, is inherently inefficient. Several tasks are executed more than once, and entire objects are created when only one attribute is needed, etc. (I'd be more specific, but I'm not a specialist on this matter, and our resident DSpace developer is on leave this week.) I am really glad to hear from other people with problems similar to ours. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 11:47, Mark Ehle wrote: Why was tomcat chosen as a platform for DSpace? It wasn't. You can use any Servlet engine. We used JBoss for a while but went back to Tomcat because it fitted into our infrastructure better. I believe DSpace was written in Java because Rob Tansley wanted to try writing a project in Java, but I could be wrong. :) Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 13:03, Graham Triggs wrote: Some of those repositories have 1000s of items, and get quite decent levels of access. Thousands? I don't even want to have this discussion until you're talking hundreds of thousands, and how many hits per second. I know you like to talk down the problem, but that really isn't helping. We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource-intensive. And yes, we want these to be single systems. Why shouldn't we? We have other systems here at the University that are much bigger, do similar things and require far, far less in terms of resources. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 22 Sep 2010, at 20:22, Sands Alden Fish wrote: (2) We currently don't have a centralized server with enough test data to run many of these memory or scalability tests on our own. I think this is something we could look into improving upon (especially if anyone has test data to donate to the cause). There is a lot of public domain data available online. I spent some time collecting some of this in a variety of formats (text, images, movies, sound, datasets) and then wrote something to use a word list (e.g. /usr/share/dict on most Linux systems) to create random metadata for them. After all, it doesn't matter that many bitstreams will be identical. That is how we populated our test environment here so we could replicate the problems we were seeing on the live system. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
I am very happy to see that this issue seems finally to be taken seriously. However, I find myself getting a bit frustrated that it was never taken seriously when I raised it in the past. I think the DSpace source code carries with it a lot of historical baggage, and it could do with being addressed even without making fundamental changes to the basic architecture. Although my personal favourite would be a completely new architecture with more loosely coupled modules, but fixing memory leaks and the associated slow performance would be a good start. I can add that, for example, deleting a collection with 1200 items on our rather powerful DSpace machines will take two hours, and uses most of the available memory. You can see why I would like that no longer to be the case. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On Mon, 20 Sep 2010, Damian Marinaccio wrote: I'm seeing the following log messages in catalina.out: [...] SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. There are quite a few memory leaks in DSpace. We have a cronjob to restart Tomcat nightly, because otherwise it'll break the next day. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 20/09/2010 : The Moon is Waxing Gibbous (80% of Full) -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Meatadata dates stored as UTC
On Fri, 25 Jun 2010, TAYLOR Robin wrote: Dates held in the metadatavalues table are converted from their local time zone to UTC before being stored in the database. The problem is that they are not generally converted back to their local time zone before being displayed (see Jira http://jira.dspace.org/jira/browse/DS-568). This is misleading to the user. You could conceiveably see that you had submitted an item whilst you were still asleep in bed. I'm not sure what to do about this. It would be messy to always check for a metadatavalue being a date before displaying it. What would be the consequences of not storing dates as UTC ? Could we store them with a time zone eg 22:30+04 ? This might be a little less confusing. I'm sure there are good reasons for storing dates as UTC I just don't know what they are, can anyone help ? They're stored in Zulu time, which has the advantage of not being dependent on time zones or daylight savings. The best thing to do is to store them in this timezone, but to convert them on display to the local time. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 25/06/2010 : The Moon is Waxing Gibbous (91% of Full) -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors
On Thu, 11 Feb 2010, Michael White wrote: :session_id=9E40BFD899A2AA5C23E81404AF5B97A5:internal_error:-- URL Was: https://dspace.stir.ac.uk/dspace/browse-title?bottom=1893/214 [snip] User-agent: * Disallow: /browse-author Disallow: /items-by-author Disallow: /browse-date Disallow: /browse-subject You should add /dspace to the start of those disallowed patterns, because your DSpace URLs start with /dspace after the hostname. The standard (or rather, consensus) has this to say about disallow fields in robot.txt: The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. Note the starts with. See also: http://www.robotstxt.org/ Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 11/02/2010 : The Moon is Waning Crescent (19% of Full) -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Dspace and https
On Tue, 9 Feb 2010, Fabien COMBERNOUS wrote: I installed a Dspace from trunk checkout. All is well running. Now i want to setup an https access to my dspace repository. With tomcat6 it looks necessary to use SSLEnabled=true in the connector about port 8443. Now i have the following error about ssl config : 09-Feb-2010 11:00:03 org.apache.coyote.http11.Http11Protocol start SEVERE: Error starting endpoint java.io.IOException: jsse.invalid_ssl_conf ... Caused by: javax.net.ssl.SSLException: No available certificate or key corresponds to the SSL cipher suites which are enabled. Have you tried following the SSL Howto? It may address your problem: http://tomcat.apache.org/tomcat-6.0-doc/ssl-howto.html Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 09/02/2010 : The Moon is Waning Crescent (33% of Full) -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] DSpace batch ingest scalability and performance
Hello all. We (the DSpace team at the University of Cambridge) are currently holding our own mini-1.6-testathon. Our particular interest lies with scalability, because it has caused us trouble in the past. If this interests anyone, I'm trying to write up our tests, notes, conclusions etc on a blog I set up for this purpose. The first figures, for importing about 100,000 items in batches, can be seen here: http://tdm27.wordpress.com/2010/01/19/dspace-1-6-scalability-testing/ I figured that this would be better than trying to write it up here on the mailinglist. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 19/01/2010 : The Moon is Waxing Crescent (24% of Full) -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Zotero
Hi all, as some/many people will know, Zotero used to work with DSpace 1.4, but no longer works with 1.5 and higher. This isn't DSpace's fault -- Zotero is merely being too eager to invoke the wrong translator when it recognises a DSpace site. If other people think this is important (and Zotero is certainly seeing more and more use), could they please add their comments to the forum thread on the Zotero forum, to push their developers to make Zotero work again with current versions of DSpace? http://forums.zotero.org/discussion/7009/dspace-translators-not-longer-valid/ Thanks, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service - 09/12/2009 : The Moon is Waning Gibbous (52% of Full) -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Batch importer spreadsheet metadata mapping tool
To whom it may be of interest: we recently had cause to develop a tool for internal use, to generate a DSpace (1.5.x) batch importer structure from a spreadsheet an associated files. This has helped facilitate batch deposit by people who otherwise would have lacked the technical prowess to generate the correct importer structure. Now, instead, they can produce a spreadsheet that describes the items they want to deposit. Given how popular this tool has turned out to be, we decided to share it with the DSpace community, in case it might prove useful: http://tools.dspace.cam.ac.uk/metadatamapper/ Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 18/09/2009 : The Moon is Waning Crescent (7% of Full) -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Why the DSpace checksum checker?
On Fri, 17 Apr 2009, Mark Diggory wrote: I've never been impressed with the reasoning behind this addition to DSpace, it mistakes bitstream security and file corruption as something that should be tracked by the DSpace application. We I agree, but with one caveat: A real file integrity system should be implemented outside of the application by an experienced system administrator vested in maintaining the security and integrity of the system, not in the application by a webapplication developer. I do value and respect the It is important to make sure that the file the web application put on disk is the same as the one still there. While various monitoring tools can check if files have changed on disk, at least at one stage should there be a verification of what the archive thinks the file's checksum is, and what's on disk. However, this is easily done outside the webapp. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 17/04/2009 : The Moon is Waning Gibbous (54% of Full) -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Performance issues with bitstream checker
On Tue, 14 Apr 2009, Ruijgrok, P.T. (Peter) wrote: I had serious performance problems with the bitstream checker, running Dspace 1.4.x We have +320.000 bitstreams and increasing continously. Sadly, the current DSpace codebase has some serious scalability issues. (And Java's MD5 implementation isn't the fastest, either, but that's not the main culprit.) For our instance, which has a separate server hosting the filesystem (which itself resides on a SAN), I wrote a Perl script to do the checksumming. It runs continuously in a loop, and manages nearly 500,000 bitstreams in 6 to 10 hours, depending on the load on the fileserver. It uses the md5sum binary from solarisfreeware.com. It puts almost no load on the database, because it only queries the checksums from the bitstream table once, at the start. Output is logged continuously, and our local Nagios server monitors for any checksum errors. This also has the advantage that it doesn't load the (Tomcat) webapp box, which already has enough work to do. It also means that the same script can run on our backup servers (which also use disk; we couldn't manage with tape). We've taken this approach with other things as well, such as our thumbnails (which aren't using the DSpace code, because we wanted to separate something as user-interface-centric as that from the actual archive contents; the DSpace code was also just too slow and just crashed our server). Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 16/04/2009 : The Moon is Waning Gibbous (59% of Full) -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Backup procedure
On Fri, 20 Feb 2009, West, Jeff wrote: I would also be interested in an answer to this question. We currently run Fedora 10 Linux. We haven't populated anything, because we want clear way to backup and restore in the event of a server crash. Are you running PostgreSQL? In which case the pg_dump command is all you need. Run it on a regular schedule with a user with sufficient access privileges, eg. pg_dump --format t yourdspacedbname databasedump.2009-02-20.sql That gives you a DB dump you can just slurp back in in future and will be enough in almost all cases. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 20/02/2009 : The Moon is Waning Crescent (38% of Full) -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Speed problem in postgres during batch ingesting
On Tue, 27 Jan 2009, Stuart Lewis wrote: The following paper talks about this, and how DSpace performs when ingesting 1 million items: Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, USA http://www.dspace.org/images/stories/ist2008_paper_submitted1.pdf Is this one big import of 30,000 items, or do you break them up into smaller chunks? That paper doesn't use the DSpace importer, so I fail to see how it can claim the importer scales well. I can tell from a lot of first-hand experience that the DSpace importer doesn't scale, and that it gets slower as you have more items in your DSpace instance, as well as slowing down for each item in the batch. In addition, if you have a busy DSpace instance, there may be issues with file locking where deleted filehandles don't get recovered properly. best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 27/01/2009 : The Moon is Waning Crescent (3% of Full) -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Google bots and web crawlers
On Wed, 14 Jan 2009, Shane Beers wrote: We had an issue with our local google instance crawling our DSpace installation and causing huge issues. I re-wrote the robots.txt to disallow anything besides the item pages themselves - no browsing pages or search pages and whatnot. Here is a copy of ours: We've had to do that for years; without it DSpace just crumbles under the load. I've got a small Perl script which generates a flat html file with links to all our item pages, and we put a link to that in the footer. So we can block all browse pages, but not item or bitstreams, and still get indexed. DSpace 1.x has major scalability issues, alas. No matter how much hardware you throw at it. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 14/01/2009 : The Moon is Waning Gibbous (83% of Full) -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Adding and removing bitstreams
On Tue, 23 Sep 2008, Hlias Stavrakis wrote: Hi, i face a problem on adding and removing bitstreams in both dspace 1.4 and 1.5 and would like to ask the community and the developers of dspace for it. It's essentially broken. Like most of the authorization system. Some of our users are really fed up with this, but it's such a mess to sort out properly. -- Tom De Mulder [EMAIL PROTECTED] - Cambridge University Computing Service - 23/09/2008 : The Moon is Waning Gibbous (53% of Full) - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] DSpace 1.5 beta 1
Could any of the more involved developers tell me why the database schema for DSpace 1.5 still has admin and submitter columns in the collection table, when there is a ResourcePolicy table? In our experience, if the former and latter disagree with each other, serious authz problems occur; it would be better if everything used the ResourcePolicy rather than the columns on the collection table. Any reason why they can't be dropped for this release? Best, -- Tom De Mulder [EMAIL PROTECTED] - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 15/02/2008 : The Moon is Waxing Gibbous (58% of Full) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace 1.5 beta 1
On Fri, 15 Feb 2008, Scott Phillips wrote: system, but that's been discussed before. To answer you're question these columns are still needed because that is where DSpace determines who is allowed to submit or administrate a collection, and yes those epersons must also be granted the basic resource policies over those objects as well - so its best to avoid situations where they are out of sync. We are way too far along in this release to consider a database schema change of this magnitude. Right. I was under the impression that, given the add/admin right in the resourcepolicy table, we could just use those. For us, here, both those columns are empty, for example. We've got a patch ready to roll out to hide the UI elements that populate them, in the hope that that'll stop them getting out of sync. I've only skimmed most of the talk about the architectural review, just being too busy to deal with the stream of emergencies at a local level. We'll definitely be working on the authn/authz system in the very near future, which will probably take us down the route of having an ACL implementation that can cope with Shibboleth and our local single signon system... I was just hoping that 1.5 would get us started further along that route. :-) Thanks for the response, -- Tom De Mulder [EMAIL PROTECTED] - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 15/02/2008 : The Moon is Waxing Gibbous (59% of Full) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech