[Dspace-tech] tomcat/jetty/resin
We're upgrading our DSpace server and taking another look at what servlet engine we should use. Has anyone done research/comparison and ended up particularly passionate about their choice? I would be interested in objective benefits of one over another, and I suspect others would too. Cory Snavely University of Michigan Library IT Core Services - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Blocking a malicious user
It has an effect if your Postgres instance isn't blocked at the firewall, and people are actually trying to access it. Which they will, unless you block them. As I said, probably much safer to block at the firewall level--better protection from DOS as well. On Thu, 2007-11-01 at 08:51 +, Stuart Lewis [sdl] wrote: Hi Sue, pg_hba.conf only controls who can communicate with Postgres, not who can communicate with DSpace. Normally it is only 'applications' (e.g. DSpace) that talk to Postgres, not users. A user talks to DSpace, who in turn talks to Postgres. Postgres has no idea or interest in the IP address of the user who is using DSpace, only that of the DSpace application. Therefore adding malicious IP address into that config file will sadly have no effect. You have to block users higher in the stack, either at the application level (apache or tomcat directives), or at the network level (firewall changes). Thanks, Stuart _ Gwasanaethau Gwybodaeth Information Services Prifysgol Aberystwyth Aberystwyth University E-bost / E-mail: [EMAIL PROTECTED] Ffon / Tel: (01970) 622860 _ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS] Sent: 31 October 2007 17:51 To: Mika Stenberg; dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Blocking a malicious user You can block ip addresses at the postgreSQL level in the pg_hba.conf file. Here is a person I blocked by ip address who was sending all kinds of GET requests to our DSpace server: hostall all malicious.ip255.255.255.255 reject Sue Walker-Thornton NASA Langley Research Center ConITS Contract 757-224-4074 [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mika Stenberg Sent: Wednesday, October 31, 2007 6:00 AM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Blocking a malicious user We've had problems like that as well. Blocking specific IP's works only for a while since many bots and spammers seem to change their IP frequently. We didnt come up with a decent solution for this, but blocking an entire country of origin for a period of time has been on my mind. Managing the allowed requests / timeslot for a specific IP might also do the trick. -Mika If they're nasty enough, though, they'll drown your Apache or Tomcat server in replying with 403s. I've had times that I needed to be absolutely merciless and block at the firewall level, using iptables; then they don't even get as far as userspace. On Tue, 2007-10-30 at 14:01 -0500, Tim Donohue wrote: George, We had a similar problem to this one in the past (a year or so ago). I just flat out blocked the IP altogether (not even specific to /bitstream/) via this Apache configuration: Location / Order Allow,Deny Deny from {malicious ip} Allow from all /Location This looks similar to your config though (except it blocks all access from that IP). - Tim George Kozak wrote: Hi... I am having a problem with an IP that keeps sending thousands of GET /bitstream/... requests for the same item. I have placed the following in my Apache.conf file: Directory /bitstream/ Options Indexes FollowSymLinks MultiViews AllowOverride All Order allow,deny allow from all deny from {malicious ip} /Directory I also placed the following in my server.xml in Tomcat: Valve className=org.apache.catalina.valves.RemoteAddrValve deny=xxx\.xxx\.xxx\.xx / However, this person still seems to be getting through. My java process is running from 50%-80% CPU usage. Does anyone have a good idea on how to shutout a malicious IP in DSpace? *** George Kozak Coordinator Web Development and Management Digital Media Group 501 Olin Library Cornell University 607-255-8924 *** [EMAIL PROTECTED] - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Blocking a malicious user
It's probably worth saying that if you run postgres and dspace on the same server, you can completely block postgres at the firewall (iptables) level. On Wed, 2007-10-31 at 12:51 -0500, Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS] wrote: You can block ip addresses at the postgreSQL level in the pg_hba.conf file. Here is a person I blocked by ip address who was sending all kinds of GET requests to our DSpace server: hostall all malicious.ip255.255.255.255 reject Sue Walker-Thornton NASA Langley Research Center ConITS Contract 757-224-4074 [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mika Stenberg Sent: Wednesday, October 31, 2007 6:00 AM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Blocking a malicious user We've had problems like that as well. Blocking specific IP's works only for a while since many bots and spammers seem to change their IP frequently. We didnt come up with a decent solution for this, but blocking an entire country of origin for a period of time has been on my mind. Managing the allowed requests / timeslot for a specific IP might also do the trick. -Mika If they're nasty enough, though, they'll drown your Apache or Tomcat server in replying with 403s. I've had times that I needed to be absolutely merciless and block at the firewall level, using iptables; then they don't even get as far as userspace. On Tue, 2007-10-30 at 14:01 -0500, Tim Donohue wrote: George, We had a similar problem to this one in the past (a year or so ago). I just flat out blocked the IP altogether (not even specific to /bitstream/) via this Apache configuration: Location / Order Allow,Deny Deny from {malicious ip} Allow from all /Location This looks similar to your config though (except it blocks all access from that IP). - Tim George Kozak wrote: Hi... I am having a problem with an IP that keeps sending thousands of GET /bitstream/... requests for the same item. I have placed the following in my Apache.conf file: Directory /bitstream/ Options Indexes FollowSymLinks MultiViews AllowOverride All Order allow,deny allow from all deny from {malicious ip} /Directory I also placed the following in my server.xml in Tomcat: Valve className=org.apache.catalina.valves.RemoteAddrValve deny=xxx\.xxx\.xxx\.xx / However, this person still seems to be getting through. My java process is running from 50%-80% CPU usage. Does anyone have a good idea on how to shutout a malicious IP in DSpace? *** George Kozak Coordinator Web Development and Management Digital Media Group 501 Olin Library Cornell University 607-255-8924 *** [EMAIL PROTECTED] - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net
Re: [Dspace-tech] Academic SRB support
...and if it seems odd to anyone following this thread that the developers of Nirvana SRB would suggest we achieve this integration by using the filesystem emulation provided by Nirvana SRB, which in turn uses the Honeycomb API, know that I definitely did point out that irony to them. However, according to these developers, the Nirvana and SDSC SRB APIs differ enough that that is the only way to do this without recoding the DSpace bitstream storage manager. Disappointing? Yeah. So am I understanding correctly that in future versions of DSpace, support for CAS systems and the like would be done in DSpace? I.e. we might expect there to be direct Honeycomb, EMC Celera, iRODS, etc support right within DSpace? We're trying to see the roadmap here. c On Wed, 2007-10-24 at 10:46 -0400, Blanco, Jose wrote: We just had a phone conference with Sun and the developer for the commercial version of SRB at Nirvana ( Tino ) and were told that the commercial version of SRB they have developed is not the same as the academic SRB. One thing they have developed is file system based SRB which *should* work, and we are going to try it out. Thanks for this information! Jose -Original Message- From: MacKenzie Smith [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 24, 2007 10:37 AM To: Blanco, Jose Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Academic SRB support Hi Jose, I haven't gotten the official story from SDSC, but I do know that their attention has shifted to iRODS as the next generation storage architecture for long-term data management. iRODS will be 100% open source software (no more dual license) which will be easier for the community to deal with. My understanding is that the commercial (Nirvana) and non-commercial (plain SRB) are actually the same thing... they just have dual license arrangement for the codebase. So the API that Sun develops *should* also work for your plain vanilla SRB instance too. You can verify that with the SDSC folks (or I can ask them). The DSpace work that we've done at MIT was for the old non-commercial SRB, and we recently got the jargon client for iRODS, so those should be tested with the 1.4.x and 1.5 releases. MacKenzie I wonder if any one has heard if the academic SRB ( non-commercial ) is going to be discontinued? We have been discussing using a Honeycomb server for bit storage, and they have informed us that the academic SRB is going to be discontinued, so they are not interested in developing an API for it. They are working on developing a commercial Nirvana SRB API. I'm assuming that the configurable SRB coming out in a future release of Dspace is the academic? http://wiki.dspace.org/index.php/PluggableStorage ? Thank you! Jose -- MacKenzie Smith Associate Director for Technology MIT Libraries - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Storing bitstreams using SRB
Presumably you would need an SRB server: http://www.sdsc.edu/srb/index.php/Main_Page . On Wed, 2007-10-17 at 06:44 -0700, Shwe Yee Than wrote: Hi, What else should I need to do other than the normal installation and configuration of DSpace if I want to store bitstreams using SRB? Anyone can help me? regards, Shwe __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Questions about DSpace Features
FYI we are having discussions with Sun about integrating DSpace with their Honeycomb CAS system. However, the approach I am advocating is to build an SRB compatibility layer/driver/translator for the product, and so insulate DSpace from the specifics of the Honeycomb API. Contact me if interested. On Wed, 2007-10-03 at 17:26 -0400, MacKenzie Smith wrote: Hi Robert, * Does DSpace have service devices (like SOA or SOAP)? Yes, for submission (see http://wiki.dspace.org/index.php/LightweightNetworkInterface). * Is it correct that DSpace does not have an internal storage management, which would mean (e.g.) to compress documents which are not accessed for a given period, or to move them to an other storage location (e.g. a tape server) if the last access is much older? You can implement any storage layer underneath DSpace using the storage API. There are implementations now for the local filesystem (the default), SRB and S3 (in prototype, I believe). I think HP has also implemented it with their HSM, but I don't know if there are other HSM systems implemented now. * And is it possible to bundle / relate different versions of the same document, e.g. preprint and postprint? This is handled now via metadata. For MIT's method of doing this see http://wiki.dspace.org/static_files/f/fa/DSpace_Versioning_Feature_Summary_(July_2004).pdf There are plans to change the DSpace data model in a future version so that it can handle versions directly within an item. This is described on the wiki (http://wiki.dspace.org/index.php/ArchReviewSynthesis). A lot of this work has already started, and the plan is to complete these changes in 2008. * Does DSpace keep track of different versions of the same document to have a history of minor changes (compared to pre- and postprint)? It is a digital archive rather than an authoring system, so no, minor changes to documents are noant normally kept. The idea is to store final versions of documents and keep them forever, and to link different *editions* of documents via metadata (see the last answer) so that users can safely cite a particular version and not worry about it disappearing later. MacKenzie - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?
Another way to get experience with the quality of Acrobat OCR is to use Acrobat Pro, which can do functionally the same thing, with a less batch-oriented interface. We ended up using this at a fairly large scale to meet a similar need. We have documentation on preparing PDFs that we supply for submitters, and that you may find useful, at http://deepblue.lib.umich.edu/html/2027.42/40244/PDF-Best_Practice.html The section toward the bottom provides instructions on making image PDF files searchable. Cory Snavely University of Michigan Library IT Core Services - Original Message - From: Jennifer Ash To: dspace-tech@lists.sourceforge.net Sent: Wednesday, July 04, 2007 6:55 AM Subject: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution? Dear Community Members The Water Research Commission (WRC, South Africa) is currently assessing a pilot installation of DSpace. We want to use DSpace to store, search and retrieve all our WRC research reports and Water SA (a scientific publication, 4 issues pa) issues (this is the primary goal; other collections will most likely be added over time). We are faced with a problem in that most of our older publications are not in electronic format and will have to be scanned. Scanning and saving as PDF does not provide a full text searchable document in DSpace; I've tried it. A product, Adobe Capture, is advertised as a 'tool that teams with your scanner to convert volumes of paper documents into searchable Adobe Portable Document Format (PDF) files'. We are keen to investigate this product but there are no trial downloads offered by Adobe. Do you have any knowledge of this product? Can you advise on a suitable tehnology solution for our problem? Our backlog is vast and spans many years, so there are loads of documents that need to be scanned. I do hope someone can give me advice. Kind regards Jennifer Ash .. Business Systems Manager Water Research Commission Private Bag X03 GEZINA (Pretoria) 0031 Tel: (012) 330-9036 / 330-0340 Fax: (012) 330-9010 / 331-2565 E-mail: [EMAIL PROTECTED] DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information within this e-mail, including any attachments relating to the official business of the Water Research Commission (WRC), is the property of the WRC. It is confidential, legally privileged and protected against unauthorized use. The WRC neither owns nor endorses any other content. Views and opinions are those of the senders unless clearly stated as being that of the WRC. The addressee in the e-mail is the intended recipient. Please notify the sender immediately if it has unintentionally reached you and do not read, disclose or use the content in any way whatsoever. The WRC cannot assure that the integrity of this communication has been maintained nor that it is free of errors, viruses, interception or interferences. -- - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] srb/s3/etc and lucene
Thanks, but when you say assetstore, I'm not sure if you are referring to the object-based storage in all cases. I will assume that you are because of the the parenthetical (s3). So, this is what I believe you are saying: When filter-media runs, it extracts text for formats such as PDF that Lucene can't directly parse, and places, using the object-based storage API those text bitstreams alongside the originals, then again uses the object-based storage API to fetch the text back out and feed it to Lucene. Consequently, nothing is stored in the filesystem except for the resulting index? Thanks, Cory On Fri, 2007-05-04 at 00:10 -0400, Mark Diggory wrote: On 5/4/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, I'm just wondering, in specific terms, if we use an object-based storage system as an assetstore rather than a filesystem, where the files that Lucene indexes actually sit. Its tricky, this is what FilterMedia is for, it actually extracts the text and places it as a bitstream in the assetstore. Lucene full text indexing is done against the assetstore bitstreams in all cases (well accept for the metadata table in the database). So ultimately your pushing the text bitstreams into the assetstore (s3) in FilterMedia and pulling it back out on Lucene indexing, a double-whammy. Cheers, Mark It's my understanding that in a filesystem-based assetstore, for example, text is extracted from PDFs and stored in a separate file *within the assetstore directory* that Lucene crawls. I just don't know how that sort of thing is handled when using object-based storage. On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote: Hi Cory: Not sure about the limits of Lucene, but I think the larger point is that the back-ends are expected only to hold the real content or assets. Everything else (full-text indices and the like) are *artifacts* (can be recreated from the assets) that we don't need to manage in the same way. If for performance reasons we want to put them where the assets are we can, but there is really no connection between the two that the system imposes. Does this get at your question, or did I miss the point? Thanks, Richard R On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote: (Apologies if this has been discussed to resolution; after a few attempts to search the archives, I concluded they are really broken. 500 errors, bad links, etc.) For those using, interested in, or knowledgeable about using API-based storage (SRB, S3) as a backend for DSpace: how does doing so affect full-text indexing? Can anyone describe how, in such a setup, full text is stored and indexed? My uneducated impression is that Lucene would want to work only against a filesystem. Thanks, Cory Snavely University of Michigan Library IT Core Services - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2
Re: [Dspace-tech] srb/s3/etc and lucene
Right--I am trying to get an understand of all this in very specific terms. On Fri, 2007-05-04 at 09:23 -0400, Mark H. Wood wrote: There are two questions here: 1) Does the use of a non-filesystem asset store backend affect Lucene's output? One would guess, no, since it doesn't do output to the asset store. 2) Does the use of a non-filesystem asset store backend affect Lucene's input? IOW how does Lucene, as used in DSpace, locate and gain access to the files it indexes? If it doesn't go through the DSpace storage layer or something equivalent then indexing is screwed. Ouch! I hadn't thought about these at all. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] srb/s3/etc and lucene
Well, I'm just wondering, in specific terms, if we use an object-based storage system as an assetstore rather than a filesystem, where the files that Lucene indexes actually sit. It's my understanding that in a filesystem-based assetstore, for example, text is extracted from PDFs and stored in a separate file *within the assetstore directory* that Lucene crawls. I just don't know how that sort of thing is handled when using object-based storage. On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote: Hi Cory: Not sure about the limits of Lucene, but I think the larger point is that the back-ends are expected only to hold the real content or assets. Everything else (full-text indices and the like) are *artifacts* (can be recreated from the assets) that we don't need to manage in the same way. If for performance reasons we want to put them where the assets are we can, but there is really no connection between the two that the system imposes. Does this get at your question, or did I miss the point? Thanks, Richard R On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote: (Apologies if this has been discussed to resolution; after a few attempts to search the archives, I concluded they are really broken. 500 errors, bad links, etc.) For those using, interested in, or knowledgeable about using API-based storage (SRB, S3) as a backend for DSpace: how does doing so affect full-text indexing? Can anyone describe how, in such a setup, full text is stored and indexed? My uneducated impression is that Lucene would want to work only against a filesystem. Thanks, Cory Snavely University of Michigan Library IT Core Services - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Generally what's going on is that Tomcat, the web application framework, has a large virtual machine running with a substantial amount of memory allocated to the caching of programs and data for performance. Depending on your database configuration, there can also be a substantial amount of allocation to cache in Postgres too. The indexer is a periodic process that does not run constantly. You still must account for the amount of memory it consumes while running. Memory requirements for recent versions of the indexing routine are of constant order, meaning they do not vary appreciably with repository size. On Wed, 2007-04-18 at 18:09 -0700, Pan Family wrote: Thank you all for giving your opinion! Technically, is it the web application or the indexer that requires most of the memory? What data is kept in memory all the time (even when nobody is searching)? Is the memory usage proportional to the number of concurrent sessions? Thanks again, Pan On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Cannot get a connection, pool exhausted
In our experience, this problem appears to be due to a bug somewhere in freeing connections back to the pool--we tend to see steady linear growth in the number of 'idle in transaction' connections until we get this error. These are visible with ps. Increasing the number of connections in the pool, for us, only delayed the occurrence of the problem. Ultimately the number of 'idle in transaction' connections would climb to the max. We put a workaround in place. This is a root crontab entry: # kill old 'idle in transaction' postgres processes, leaving up to 10 * * * * * while /usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | /usr/bin/wc -l` -gt 10; do /usr/bin/pkill -o -f 'idle in transaction'; done At one point I was entertaining a theory that the Apache connection pool manager delivered with DSpace was a stale version. To date, the workaround has worked so well that I'm not sure that theory has been fully explored. Also, FWIW, there have been lengthy discussions on this list about this topic already. You would probably find the previous thread useful as I'm quite sure I'm not retelling everything here. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 12:13 +0530, Filbert Minj wrote: Hi Stuart, Thanks very much for the prompt reply. Recently we have upgraded it to Dspace 1.4.1 on RHEL 4 using postgres database. I made the change in db.maxconnections and I think this should solve the problem. I had forgotten, earlier we had the same problem and did exactly what you suggested. Cheers, -- Filbert - Original Message - From: Stuart Lewis [sdl] [EMAIL PROTECTED] To: Filbert Minj [EMAIL PROTECTED]; dspace-tech@lists.sourceforge.net Sent: Wednesday, April 18, 2007 11:32 AM Subject: Re: [Dspace-tech] Cannot get a connection, pool exhausted Hi Filbert, Has any one faced similar problem. WARN org.dspace.app.webui.servlet.DSpaceServlet @ anonymous:no_context:database_error:org.apache.commons.dbcp.SQLNestedException : Cannot get a connection, pool exhausted What is solution of this problem. DSpace holds a 'pool' of connections to the database which it reuses. This means it doesn't have the overhead of creating a connection to the database each time it needs to talk to the database. The error message suggests that all of these connections are in use, and it has reached the number of connections that you have said it can have. The default set in [connections]/config/dspace.cfg is: db.maxconnections = 30 There are two reasons that you might be reaching this limit - 1) Your DSpace is very busy (lots of visitors) and there are not enough connections to cope. If your hardware is large enough to cope with number of connections, you could think about increasing the number of connections in the pool. (change the number, restart Tomcat). 2) For some reason, DSpace might not be letting go of some old connections, or they might be stuck in some way. If you are using UNIX and postgres, you should be able to see the connections, and what they are doing, by running a 'ps' on them (make sure you're screen is wide to see what comes at the end of the line). This might show that the connections are stuck - typical state might be 'idle in transaction'. This can also happen if connections to the database are not closed properly by DSpace. Which version / operating system / database do you use? I hope this helps, Stuart _ Datblygydd Cymwysiadau'r WeWeb Applications Developer Gwasanaethau Gwybodaeth Information Services Prifysgol Cymru Aberystwyth University of Wales Aberystwyth E-bost / E-mail: [EMAIL PROTECTED] Ffon / Tel: (01970) 622860 _ -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https
Re: [Dspace-tech] DSpace a memory hog?
This depends on your definition of a memory hog. We run a relatively large instance of DSpace and we allocate 512MB to Tomcat, about 100MB to Postgres, and 256MB for daily indexing runs (via the dsrun script). In earlier versions of DSpace the indexing routine needed to be patched to work around a poor implementation that caused memory allocation to be linear with repository size. Without that, we were running out of memory during indexing. I believe that patch is now part of the base. We run comfortably inside 2G of physical memory. I may have considered that a memory hog 5 years ago, but today I consider it light. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 01:01 -0700, Pan Family wrote: Hi, There is a rumor that says DSpace is a memory hog. I don't know where this is from but it may not be that important. What is important is that it makes my management nerves. So I'd like to hear from those who know anything about this issue. Is it really a memory hog? Under what circumstances it might become a memory hog? Or there should be no worry about memory usage at all? Thanks a lot in advance! -Pan - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Large files and DSpace
I'd be interested to know how using SRB addresses the problem, which I understand to be the logistics of handling such a large file in both the user interface and the back end. Does it? Cory Snavely University of Michigan Library IT Core Services - Original Message - From: Ekaterina Pechekhonova [EMAIL PROTECTED] To: Gary Browne [EMAIL PROTECTED] Cc: dspace-tech@lists.sourceforge.net Sent: Monday, April 16, 2007 8:12 PM Subject: Re: [Dspace-tech] Large files and DSpace Hi Gary, you can configure Dspace to use SRB instead of regular assetstore. Some basic information can be found in the docs which come with Dspace.Also you can check this link: http://wiki.dspace.org/index.php//DspaceSrbIntegration Kate Ekaterina Pechekhonova Digital Library Programmer/Analyst New York University Libraries email: [EMAIL PROTECTED] phone: 212-992-9993 - Original Message - From: Gary Browne [EMAIL PROTECTED] Date: Monday, April 16, 2007 7:41 pm Subject: [Dspace-tech] Large files and DSpace To: dspace-tech@lists.sourceforge.net Hello All I think I posted a question like this last year but I've just become a dad for the first time and have a bit of brain meltdown. I tried searching for answers on the annoying sourceforge list archive (should I start a separate thread about this...?) but didn't find much. My question is a general one in that I'm wondering how people are handling large files in DSpace (getting them onto the server, submissions and publication/access)? Is the SymLink stuff the only option at this point? For example, we have (and will be getting lots more of) a 12GB video file to be used in one of our collections. I'd like to nut out what the possible options are before I try anything. Thanks and kind regards Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Assetstore physical storage
There's a whole discussioon there about what's the right tool for the job, but integration with Lucene would be my guess as to the practical reason. I'd be interested to learn if that, in fact, were not a constraint. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-11 at 11:30 -0700, Ryan Ordway wrote: Is there a reason why only the metadata is stored in the database and not the actual assetstore bitstreams? Has anyone considered changing the physical storage from the filesystem to the database? I'm working on building some redundancy into my infrastructure and it's looking like the most efficient way to store the assetstore data in clustered configurations would be in the database, especially when your database is already clustered across multiple systems. Your database gets much larger, but you don't have to worry about keeping your assetstores synchronized, etc. Any thoughts? Anyone to blame? ;-) Ryan - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] redirect port 8443 to 80?
Right, and that was my initial approach, but it seemed to have the effect of blocking traffic to port 80. As I've said, I'm not seeing it as a real problem, but rather just letting people know that it is an ugliness associated with this (NAT) approach. On Sat, 2007-04-07 at 12:26 -0400, Mark Diggory wrote: On Apr 7, 2007, at 12:08 PM, Mark H. Wood wrote: On Fri, Apr 06, 2007 at 12:07:44PM -0400, Cory Snavely wrote: For folks listening in with interest, we also use NAT port forwarding to get around the requirement for mod_jk, but FWIW I haven't determined a way to close the incoming *actual* Tomcat ports (8080/8443). Just don't open them. In [tomcat]conf/server.xml comment out the Connector with 'port=8080' and leave commented the one with 'port=8443'. You should then only be running AJP 1.3 on 8009 and the shutdown port on localhost:8005. If you want to limit AJP to the local host, you can add 'address=127.0.0.1' to the AJP Connector. -- Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] Typically when a software vendor says that a product is intuitive he means the exact opposite. MarkW, This would only be the case if they were using mod_jk/Apache. but, they are trying to use NAT/port forwarding and this means those Tomcat ports are what are getting forwarded to. I'd say the quickest solution is to just block those ports from external requests in the NAT/firewall configuration. -Mark Diggory ~ Mark R. Diggory - DSpace Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] redirect port 8443 to 80?
For folks listening in with interest, we also use NAT port forwarding to get around the requirement for mod_jk, but FWIW I haven't determined a way to close the incoming *actual* Tomcat ports (8080/8443). So, a potential downside with this approach, in addition to not having any real logic like mod_rewrite to apply at that intermediary level. Mind you, it's not really harmful or vulnerable, it's just a little ugly to have your actual nonstandard ports all hanging out like that. Cory Snavely University of Michigan Library IT Core Services On Fri, 2007-04-06 at 11:56 -0400, Mark Diggory wrote: We use Apache, mod_jk and mod_rewrite to deliver the webapplication on port 80 and port 443 as separate VirtualHost entries in Apache httpd. We do not allow direct access to the tomcat server over port 8080 or port 8443. I can send some more detail of our configuration if you decide to go this route. -Mark On Apr 6, 2007, at 11:32 AM, James Rutherford wrote: On Thu, Apr 05, 2007 at 09:39:53AM -0600, Zhiwu Xie wrote: bar, but then when I click the DSpace logo from a secured page such as https://laii-dspace.unm.edu/password-login all the following pages are through https regardless of which the page is, which bothers me. The links used in DSpace are relative, so if you login via https, you will continue with https. But when I tried to click the dspace logo from the mit dspace page https://dspace.mit.edu/password-login the request to the https://dspace.mit.edu/ seems to be rerouted to http://dspace.mit.edu/. So what's the trick? The only reason the MIT site is different is because (I assume) they have some custom configuration elsewhere that redirects https requests to http for normal use. If you try accessing https://dspace.mit.edu you will be redirected to the unsecured version at http://dspace.mit.edu. cheers, Jim -- James Rutherford | Hewlett-Packard Limited registered Office: Research Engineer | Cain Road, HP Labs | Bracknell, Bristol, UK | Berks +44 117 312 7066 | RG12 1HN. [EMAIL PROTECTED] | Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL. -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php? page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech ~ Mark R. Diggory - DSpace Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Office: E25-131 Phone: (617) 253-1096 - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Data integrity/preservation issues and mirroring development-production servers
This illustrates the importance of NOT confusing *replication* for redundancy, whether that be rsync, LOCKSS, something SAN-based, etc, with *backups* for version retention, whether that be conventional weekly-full/daily-incr, snapshots, CDP, etc. (It also illustrates the importance of validating checksums regularly!) This is the kind of thing Mark was getting at. SDR guidelines and good preservation policies should require redundancy for availability and/or disaster recovery, checksums (and periodic validation!) for integrity purposes, and backups for protection against human error and/or for disaster recovery. HOWEVER, implementing those things in a way that serves their preservation goals requires a sysadmin who understands those preservation goals. For example, ideally, backup or snapshot retention would be at least twice as long as the frequency with which checksums are validated, so that if a validation error is detected, you have at least two previous copies to go back to. Ultimately there is a level of detail below which local decisions on implementation are irrelevant--for example, the architecture of the backup system--but without some understanding of the preservation goals, a sysadmin is not guaranteed to make the right decision. Cory Snavely University of Michigan Library IT Core Services On Tue, 2007-02-20 at 09:30 +, Philip Adams wrote: Hi, Checksums may be reassuring for checking that a file still has integrity, but they leave open the question of what to do if the checksums do not match. There is a growing movement of people interested in trying to ensure that digital preservation techniques exist to overcome this problem. One of the most interesting applications to come out of this is LOCKSS (Lots of Copies Keeps Stuff Safe) see http://www.lockss.org/lockss/Home for details. Most of the material archived using LOCKSS so far is from electronic journals, with some government papers and the odd blog. LOCKSS acts as a store, a proxy and a repairer. If applied to DSpace, it could enable a kind of co-operative backup network to develop with copies of content from repositories mirrored on a number of LOCKSS boxes. If your DSpace was unable to deliver content it could be served up from LOCKSS acting as a proxy instead. LOCKSS boxes spend much of their time contacting each other to take part in integrity checking polls and repairing content where required. There is a recent survey of the digital preservation strategies available at the moment at http://www.clir.org/pubs/reports/pub138/pub138.pdf. De Montfort University is taking part in the UK LOCKSS Pilot programme: http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/programme_lockss.aspx. Perhaps repository owners could use LOCKSS in either public or private networks to look after the digital preservation aspects of managing their content. Regards, Philip Adams Senior Assistant Librarian (Electronic Services Development) De Montfort University Library 0116 250 6397 - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] How to configure Postfix...??
sendmail is one of the most arcane Unix systems known to exist. It is also extrmely popular and ubiquitous. Choose it if you want to impress your nerdy friends. postfix is much simpler to configure. Nobody could possibly disagree with that. There are others. Debian systems install with exim, for example. As other have mentioned, the distro you choose should give you a working MTA configuration out of the box, and you probably don't even need to know what it is. Your first order of business should be finding that feature and employing it. Cory Snavely University of Michigan Library IT Core Services On Fri, 2007-02-16 at 00:20 +0530, Sahil Dave wrote: well i have never configured any MTA bfore .. so i needed some good info.. which do u think is more supported ... sendmail or postfix??? On 2/15/07, James Rutherford [EMAIL PROTECTED] wrote: apologies for sending this twice. in future, make sure you 'reply-all' on the mailing list emails so that your responses go back to the list. cheers, jim. On 15/02/07, James Rutherford [EMAIL PROTECTED] wrote: On 14/02/07, Sahil Dave [EMAIL PROTECTED] wrote: yes i am running Mandriva 2007.. but i need to deploy Dspace on RHEL 4 - ES in my Library... what all changes do i need to make to the postfix DSpace config. files?? RHEL4 will probably have sendmail setup and configured already. You can check to see if it is by running (as root) lsof -i tcp:25 you should see something like the following if it is running: [EMAIL PROTECTED] ~]# lsof -i tcp:25 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME sendmail 2995 root3u IPv4 6365 TCP localhost.localdomain:smtp (LISTEN) If this is the case, you just need to configure the mail server in your dspace.cfg to be localhost, and add the username and password as required for the sendmail configuration. Note that if you're running sendmail purely for your DSpace repository, you should configure your firewall to block external connections to port 25 to avoid being used as a relay. There is nothing special about DSpace SMTP requirements, so for whichever software you use, you should be able to find ample documentation and sample configuration files. I'm afraid I don't really know much about postfix, but I do know that it is a well-documented project, so you should have no problems using it if you really want to. Jim. -- Sahil MCA(SE) USIT - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] need some suggestions plzzzzz
We looked into using a single naming authority for items in DSpace and not in DSpace, and it's problematic because DSpace essentially has naming authority for submitted items. It would be difficult to predict its naming and work around it. So we have a main naming authority and a DSpace sub-naming authority off that. It's no big deal. If you were really really tied to having one, you could in theory create handles that were pointers into DSpace either using the DSpace handle resolution mechanism, or not. Note that you would have to customize the link generation in DSpace where it provides a bookmarkable URL to the user. I'm not sure how you would tell DSpace what the externally-created identifier is, though. It sounds messy. In my estimation, it's much easier to accept the fact that DSpace is a relatively self-contained system that creates and resolves its own identifiers. Cory Snavely University of Michigan Library IT Core Services On Tue, 2007-02-13 at 10:05 -0600, Krishna wrote: Hello everyone, I need some suggestions. We are trying to integrate DSpace to a system which already uses handle system. If we want to use DSpace to store the data which also uses internal handle system, how do we do it. we would like to use only the handles which we already have and not the handles that DSpace uses . Is there any place in DSpace(may be metadata) to store the handle identifier generated by our system and use these handles to retrieve the data from the DSpace repository. Thanking you all' Krishna - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] connections to db seem to be getting stuck
Note that this error is not referring to the Postgres connections themselves, but the connection pool within DSpace from which the database connections are allocated. Postgres is blissfully ignorant of the problem, and I believe we'd see this problem even if we tripled the number of connections. At one point we did see the number of Postgres connections being exhausted because I hadn't done the math for how many DSpace instances we're running and configured Postgres accordingly, but as soon as I tweaked that up to account for that, that problem went away. What we are observing now is much more like a database connection pool leak of some kind. Little by little, apparently after aggressive hits, Postgres connections go into a permanent idle in transaction state, and eventually all of the pool is used up. A restart of Tomcat or Postgres will free the connections. Apparently idle in transaction is Postgres waiting on the client mid-transaction. We don't seem to see hangs on database activity manifested in the web interface, which makes me suspect there is not a problem with queries completing successfully but rather something more insidious in how the pool is managed--maybe the idle in transaction state is caused due to some sort of race condition as an active connection in the pool is assigned to another running thread. For the moment, I have installed a dirty little crontab entry that runs this on the minute: /usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \ /usr/bin/wc -l ` -gt 20 /usr/bin/pkill -o -f 'idle in transaction' In English: every minute, if there are more than 20 idle in transaction Postgres processes, it kills the oldest one. Cory Snavely University of Michigan Library IT Core Services On Fri, 2007-01-19 at 11:58 -0500, Mark Diggory wrote: What about postgres? How many connections is it making available? You'll want to roughly multiply it by the number of webapplications your running, so for instance db.maxconnections = 50 db.maxwait = 5000 db.maxidle = 5 running dspace.war, dspace-oai.war and dspace-srw.war postgres needs about 150 connections in it postgres.conf. I usually increment that by one for cron jobs as well: for instance in my current config we run two virtual hosts with 3 webapps each and 1 set for crons: 2 vhosts *( 3 webservices +1 cron) * 50 in pool = 400 #- -- # CONNECTIONS AND AUTHENTICATION #- -- max_connections = 400 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. Its not a hard-fast rule, we never really exhaust that many connections in one instance, but somewhere between that and the default 100 there is a sweet spot. -Mark On Jan 19, 2007, at 11:43 AM, Jose Blanco wrote: Actually I mean, more frequently today. Sorry about that. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jose Blanco Sent: Friday, January 19, 2007 11:42 AM To: 'Dorothea Salo' Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] connections to db seem to be getting stuck It was dying on us a couple of times a week, but for some reason, it's dying more frequently this week. Could you share your config db parameters. Right now I have the default settings. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dorothea Salo Sent: Friday, January 19, 2007 11:28 AM Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] connections to db seem to be getting stuck Jose Blanco wrote: So what do you do? Restart tomcat all day long? For some reason, it is happening very frequently today. It's making the system kind of unusable when every 30 minutes to an hour tomcat has to be restarted. That often? Wow. It dies on us a couple of times a week, and not always for this reason as best I can tell. It's a bit comforting to know it's not just my problem. Will you be at the Open repository conference in San Antonio next week? I'll be there, and hope we can get some help on this. Agreed! And yes, I will be there. Dorothea -- Dorothea Salo, Digital Repository Services Librarian (703)993-3742 [EMAIL PROTECTED] AIM: gmumars MSN 2FL, Fenwick Library George Mason University 4400 University Drive, Fairfax VA 22031 -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys