[Dspace-tech] tomcat/jetty/resin

2008-03-14 Thread Cory Snavely
We're upgrading our DSpace server and taking another look at what
servlet engine we should use.

Has anyone done research/comparison and ended up particularly passionate
about their choice? I would be interested in objective benefits of one
over another, and I suspect others would too.

Cory Snavely
University of Michigan Library IT Core Services


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Blocking a malicious user

2007-11-01 Thread Cory Snavely
It has an effect if your Postgres instance isn't blocked at the
firewall, and people are actually trying to access it. Which they will,
unless you block them. As I said, probably much safer to block at the
firewall level--better protection from DOS as well.

On Thu, 2007-11-01 at 08:51 +, Stuart Lewis [sdl] wrote:
 Hi Sue,
 
 pg_hba.conf only controls who can communicate with Postgres, not who can
 communicate with DSpace.
 
 Normally it is only 'applications' (e.g. DSpace) that talk to Postgres,
 not users.
 
 A user talks to DSpace, who in turn talks to Postgres. Postgres has no
 idea or interest in the IP address of the user who is using DSpace, only
 that of the DSpace application.
 
 Therefore adding malicious IP address into that config file will sadly
 have no effect. You have to block users higher in the stack, either at
 the application level (apache or tomcat directives), or at the network
 level (firewall changes).
 
 Thanks,
 
 
 Stuart
 _
 
 Gwasanaethau Gwybodaeth  Information Services
 Prifysgol Aberystwyth  Aberystwyth University
 
 E-bost / E-mail: [EMAIL PROTECTED] 
  Ffon / Tel: (01970) 622860
 _
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]
 Sent: 31 October 2007 17:51
 To: Mika Stenberg; dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] Blocking a malicious user
 
 You can block ip addresses at the postgreSQL level in the pg_hba.conf
 file.  Here is a person I blocked by ip address who was sending all
 kinds of GET requests to our DSpace server:
 
 hostall all malicious.ip255.255.255.255
 reject
 
 Sue Walker-Thornton
 NASA Langley Research Center
 ConITS Contract
 757-224-4074
 [EMAIL PROTECTED]
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Mika
 Stenberg
 Sent: Wednesday, October 31, 2007 6:00 AM
 To: dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] Blocking a malicious user
 
 We've had problems like that as well. Blocking specific IP's works only
 for
 a while since many bots and spammers seem to change their IP frequently.
 We
 didnt come up with a decent solution for this, but  blocking an entire
 country of origin for a period of time has been on my mind. Managing the
 allowed requests / timeslot for a specific IP might also do the trick.
 
 -Mika
 
 
  If they're nasty enough, though, they'll drown your Apache or Tomcat
  server in replying with 403s. I've had times that I needed to be
  absolutely merciless and block at the firewall level, using iptables;
  then they don't even get as far as userspace.
  
  On Tue, 2007-10-30 at 14:01 -0500, Tim Donohue wrote:
   George,
   
   We had a similar problem to this one in the past (a year or so ago).
 I
  
   just flat out blocked the IP altogether (not even specific to 
   /bitstream/) via this Apache configuration:
   
   Location /
Order Allow,Deny
   
Deny from {malicious ip}
   
Allow from all
   /Location
   
   This looks similar to your config though (except it blocks all
 access 
   from that IP).
   
   - Tim
   
   George Kozak wrote:
Hi...

I am having a problem with an IP that keeps sending thousands of
 GET
  
/bitstream/... requests for the same item.

I have placed the following in my Apache.conf file:

Directory /bitstream/
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
deny from {malicious ip}
/Directory

I also placed the following in my server.xml in Tomcat:
Valve className=org.apache.catalina.valves.RemoteAddrValve 
deny=xxx\.xxx\.xxx\.xx /

However, this person still seems to be getting through.  My java 
process is running from 50%-80% CPU usage.  Does anyone have a
 good 
idea on how to shutout a malicious IP in DSpace?

***
George Kozak
Coordinator
Web Development and Management
Digital Media Group
501 Olin Library
Cornell University
607-255-8924
***
[EMAIL PROTECTED] 


   
 
 
 -
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a
  browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

   
  
 
 

Re: [Dspace-tech] Blocking a malicious user

2007-10-31 Thread Cory Snavely
It's probably worth saying that if you run postgres and dspace on the
same server, you can completely block postgres at the firewall
(iptables) level.

On Wed, 2007-10-31 at 12:51 -0500, Thornton, Susan M. (LARC-B702)[NCI
INFORMATION SYSTEMS] wrote:
 You can block ip addresses at the postgreSQL level in the pg_hba.conf
 file.  Here is a person I blocked by ip address who was sending all
 kinds of GET requests to our DSpace server:
 
 hostall all malicious.ip255.255.255.255
 reject
 
 Sue Walker-Thornton
 NASA Langley Research Center
 ConITS Contract
 757-224-4074
 [EMAIL PROTECTED]
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Mika
 Stenberg
 Sent: Wednesday, October 31, 2007 6:00 AM
 To: dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] Blocking a malicious user
 
 We've had problems like that as well. Blocking specific IP's works only
 for
 a while since many bots and spammers seem to change their IP frequently.
 We
 didnt come up with a decent solution for this, but  blocking an entire
 country of origin for a period of time has been on my mind. Managing the
 allowed requests / timeslot for a specific IP might also do the trick.
 
 -Mika
 
 
  If they're nasty enough, though, they'll drown your Apache or Tomcat
  server in replying with 403s. I've had times that I needed to be
  absolutely merciless and block at the firewall level, using iptables;
  then they don't even get as far as userspace.
  
  On Tue, 2007-10-30 at 14:01 -0500, Tim Donohue wrote:
   George,
   
   We had a similar problem to this one in the past (a year or so ago).
 I
  
   just flat out blocked the IP altogether (not even specific to 
   /bitstream/) via this Apache configuration:
   
   Location /
Order Allow,Deny
   
Deny from {malicious ip}
   
Allow from all
   /Location
   
   This looks similar to your config though (except it blocks all
 access 
   from that IP).
   
   - Tim
   
   George Kozak wrote:
Hi...

I am having a problem with an IP that keeps sending thousands of
 GET
  
/bitstream/... requests for the same item.

I have placed the following in my Apache.conf file:

Directory /bitstream/
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
deny from {malicious ip}
/Directory

I also placed the following in my server.xml in Tomcat:
Valve className=org.apache.catalina.valves.RemoteAddrValve 
deny=xxx\.xxx\.xxx\.xx /

However, this person still seems to be getting through.  My java 
process is running from 50%-80% CPU usage.  Does anyone have a
 good 
idea on how to shutout a malicious IP in DSpace?

***
George Kozak
Coordinator
Web Development and Management
Digital Media Group
501 Olin Library
Cornell University
607-255-8924
***
[EMAIL PROTECTED] 


   
 
 
 -
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a
  browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

   
  
 
 
 -
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a
 browser.
  Download your FREE copy of Splunk now  http://get.splunk.com/
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
  
  
 
 
 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 

Re: [Dspace-tech] Academic SRB support

2007-10-24 Thread Cory Snavely
...and if it seems odd to anyone following this thread that the
developers of Nirvana SRB would suggest we achieve this integration by
using the filesystem emulation provided by Nirvana SRB, which in turn
uses the Honeycomb API, know that I definitely did point out that irony
to them.

However, according to these developers, the Nirvana and SDSC SRB APIs
differ enough that that is the only way to do this without recoding the
DSpace bitstream storage manager.

Disappointing? Yeah.

So am I understanding correctly that in future versions of DSpace,
support for CAS systems and the like would be done in DSpace? I.e. we
might expect there to be direct Honeycomb, EMC Celera, iRODS, etc
support right within DSpace? We're trying to see the roadmap here.

c

On Wed, 2007-10-24 at 10:46 -0400, Blanco, Jose wrote:
 We just had a phone conference with Sun and the developer for the
 commercial version of SRB at Nirvana ( Tino ) and were told that the
 commercial version of SRB they have developed is not the same as the
 academic SRB.  One thing they have developed is file system based SRB
 which *should* work, and we are going to try it out.
 
 Thanks for this information!
 
 Jose
 
 -Original Message-
 From: MacKenzie Smith [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 24, 2007 10:37 AM
 To: Blanco, Jose
 Cc: dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] Academic SRB support
 
 Hi Jose,
 
 I haven't gotten the official story from SDSC, but I do know that their
 attention has shifted to iRODS as the next generation storage
 architecture for long-term data management. iRODS will be 100% open
 source software (no more dual license) which will be easier for the
 community to deal with.
 
 My understanding is that the commercial (Nirvana) and non-commercial
 (plain SRB) are actually the same thing... they just have dual license
 arrangement for the codebase. So the API that Sun develops *should* also
 work for your plain vanilla SRB instance too. You can verify that with
 the SDSC folks (or I can ask them).
 
 The DSpace work that we've done at MIT was for the old non-commercial
 SRB, and we recently got the jargon client for iRODS, so those should be
 tested with the 1.4.x and 1.5 releases.
 
 MacKenzie
  I wonder if any one has heard if the academic SRB ( non-commercial ) 
  is going to be discontinued?  We have been discussing using a 
  Honeycomb server for bit storage, and they have informed us that the 
  academic SRB is going to be discontinued, so they are not interested 
  in developing an API for it.  They are working on developing a 
  commercial Nirvana SRB API.  I'm assuming that the configurable SRB 
  coming out in a future release of Dspace is the academic?
 
  http://wiki.dspace.org/index.php/PluggableStorage ?
 
  Thank you!
  Jose
 
 
 --
 MacKenzie Smith
 Associate Director for Technology
 MIT Libraries
 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Storing bitstreams using SRB

2007-10-17 Thread Cory Snavely
Presumably you would need an SRB server:
http://www.sdsc.edu/srb/index.php/Main_Page .

On Wed, 2007-10-17 at 06:44 -0700, Shwe Yee Than wrote:
 Hi,
  
 What else should I need to do other than the normal installation and
 configuration of DSpace if I want to store bitstreams using SRB?
 Anyone can help me?
  
 regards,
 Shwe
  
 __
 Do You Yahoo!?
 Tired of spam? Yahoo! Mail has the best spam protection around 
 http://mail.yahoo.com 
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Questions about DSpace Features

2007-10-04 Thread Cory Snavely
FYI we are having discussions with Sun about integrating DSpace with
their Honeycomb CAS system. However, the approach I am advocating is to
build an SRB compatibility layer/driver/translator for the product, and
so insulate DSpace from the specifics of the Honeycomb API. Contact me
if interested.

On Wed, 2007-10-03 at 17:26 -0400, MacKenzie Smith wrote:
 Hi Robert,
  * Does DSpace have service devices (like SOA or SOAP)?

 Yes, for submission (see 
 http://wiki.dspace.org/index.php/LightweightNetworkInterface).
  * Is it correct that DSpace does not have an internal storage
  management, which would mean (e.g.) to compress documents which are not
  accessed for a given period, or to move them to an other storage 
  location (e.g. a tape server) if the last access is much older?

 You can implement any storage layer underneath DSpace using the storage 
 API. There are implementations now for the local filesystem (the 
 default), SRB and S3 (in prototype, I believe). I think HP has also 
 implemented it with their HSM, but I don't know if there are other HSM 
 systems implemented now.
  * And is it possible to bundle / relate different versions of the same 
  document, e.g. preprint and postprint?

 This is handled now via metadata. For MIT's method of doing this see 
 http://wiki.dspace.org/static_files/f/fa/DSpace_Versioning_Feature_Summary_(July_2004).pdf
 
 There are plans to change the DSpace data model in a future version so 
 that it can handle versions directly within an item. This is described 
 on the wiki (http://wiki.dspace.org/index.php/ArchReviewSynthesis). A 
 lot of this work has already started, and the plan is to complete these 
 changes in 2008.
  * Does DSpace keep track of different versions of the same document to 
  have a history of minor changes (compared to pre- and postprint)?

 It is a digital archive rather than an authoring system, so no, minor 
 changes to documents are noant normally kept. The idea is to store final 
 versions of documents and keep them forever, and to link different 
 *editions* of documents via metadata (see the last answer) so that users 
 can safely cite a particular version and not worry about it disappearing 
 later.
 
 MacKenzie
 

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?

2007-07-04 Thread Cory Snavely
Another way to get experience with the quality of Acrobat OCR is to use Acrobat 
Pro, which can do functionally the same thing, with a less batch-oriented 
interface. We ended up using this at a fairly large scale to meet a similar 
need.

We have documentation on preparing PDFs that we supply for submitters, and that 
you may find useful, at

http://deepblue.lib.umich.edu/html/2027.42/40244/PDF-Best_Practice.html

The section toward the bottom provides instructions on making image PDF files 
searchable.

Cory Snavely
University of Michigan Library IT Core Services
  - Original Message - 
  From: Jennifer Ash 
  To: dspace-tech@lists.sourceforge.net 
  Sent: Wednesday, July 04, 2007 6:55 AM
  Subject: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture 
asolution?


  Dear Community Members



  The Water Research Commission (WRC, South Africa) is currently assessing a 
pilot installation of DSpace.

  We want to use DSpace to store, search and retrieve all our WRC research 
reports and Water SA (a scientific publication, 4 issues pa) issues (this is 
the primary goal; other collections will most likely be added over time).

  We are faced with a problem in that most of our older publications are not in 
electronic format and will have to be scanned.

  Scanning and saving as PDF does not provide a full text searchable document 
in DSpace; I've tried it.



  A product, Adobe Capture, is advertised as a 'tool that teams with your 
scanner to convert volumes of paper documents into searchable Adobe Portable 
Document Format (PDF) files'.

  We are keen to investigate this product but there are no trial downloads 
offered by Adobe.

  Do you have any knowledge of this product? Can you advise on a suitable 
tehnology solution for our problem? Our backlog is vast and spans many years, 
so there are loads of documents that need to be scanned.



  I do hope someone can give me advice.



  Kind regards





  Jennifer Ash 
  ..
  Business Systems Manager
  Water Research Commission 
  Private Bag X03 
  GEZINA (Pretoria) 
  0031 
  Tel: (012) 330-9036 / 330-0340 
  Fax: (012) 330-9010 / 331-2565 
  E-mail: [EMAIL PROTECTED] 




  DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information within 
this e-mail, including any attachments relating to the official business of the 
Water Research Commission (WRC), is the property of the WRC. It is 
confidential, legally privileged and protected against unauthorized use. The 
WRC neither owns nor endorses any other content. Views and opinions are those 
of the senders unless clearly stated as being that of the WRC. The addressee in 
the e-mail is the intended recipient. Please notify the sender immediately if 
it has unintentionally reached you and do not read, disclose or use the content 
in any way whatsoever. The WRC cannot assure that the integrity of this 
communication has been maintained nor that it is free of errors, viruses, 
interception or interferences. 

   






--


  -
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/


--


  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] srb/s3/etc and lucene

2007-05-04 Thread Cory Snavely
Thanks, but when you say assetstore, I'm not sure if you are referring
to the object-based storage in all cases. I will assume that you are
because of the the parenthetical (s3).

So, this is what I believe you are saying: When filter-media runs, it
extracts text for formats such as PDF that Lucene can't directly parse,
and places, using the object-based storage API those text bitstreams
alongside the originals, then again uses the object-based storage API to
fetch the text back out and feed it to Lucene.

Consequently, nothing is stored in the filesystem except for the
resulting index?

Thanks,
Cory

On Fri, 2007-05-04 at 00:10 -0400, Mark Diggory wrote:
  
  On 5/4/07, Cory Snavely [EMAIL PROTECTED] wrote:
  Well, I'm just wondering, in specific terms, if we use an
  object-based 
  storage system as an assetstore rather than a filesystem,
  where the
  files that Lucene indexes actually sit.
 
 
 Its tricky, this is what FilterMedia is for, it actually extracts the
 text and places it as a bitstream in the assetstore. Lucene full text
 indexing is done against the assetstore bitstreams in all cases (well
 accept for the metadata table in the database). So ultimately your
 pushing the text bitstreams into the assetstore (s3) in FilterMedia
 and pulling it back out on Lucene indexing, a double-whammy.
 
 
 Cheers,
 Mark
 
  
  It's my understanding that in a filesystem-based assetstore,
  for
  example, text is extracted from PDFs and stored in a
  separate file 
  *within the assetstore directory* that Lucene crawls. I just
  don't know
  how that sort of thing is handled when using object-based
  storage.
  
  On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote:
   Hi Cory: 
  
   Not sure about the limits of Lucene, but I think the
  larger point is
   that the back-ends are expected only to hold the real
  content or assets.
   Everything else (full-text indices and the like) are
  *artifacts* (can be 
   recreated from the assets) that we don't need to manage in
  the same way.
   If for performance reasons we want to put them where the
  assets are we
   can, but there is really no connection between the two
  that the system 
   imposes.
  
   Does this get at your question, or did I miss the point?
  
   Thanks,
  
   Richard R
  
   On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote:
(Apologies if this has been discussed to resolution;
  after a few 
attempts to search the archives, I concluded they are
  really broken. 500
errors, bad links, etc.)
   
For those using, interested in, or knowledgeable about
  using API-based 
storage (SRB, S3) as a backend for DSpace: how does
  doing so affect
full-text indexing? Can anyone describe how, in such a
  setup, full text
is stored and indexed?
   
My uneducated impression is that Lucene would want to
  work only against 
a filesystem.
   
Thanks,
Cory Snavely
University of Michigan Library IT Core Services
   
   
   
   
  
  - 
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express
  and take
control of your XML. No limits. Just data. Click to get
  it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  
  
  
  
  - 
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and
  take
  control of your XML. No limits. Just data. Click to get it
  now.
  http://sourceforge.net/powerbar/db2/
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
  
  -
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2

Re: [Dspace-tech] srb/s3/etc and lucene

2007-05-04 Thread Cory Snavely
Right--I am trying to get an understand of all this in very specific
terms.

On Fri, 2007-05-04 at 09:23 -0400, Mark H. Wood wrote:
 There are two questions here:
 
 1)  Does the use of a non-filesystem asset store backend affect Lucene's
 output?  One would guess, no, since it doesn't do output to the
 asset store.
 
 2)  Does the use of a non-filesystem asset store backend affect
 Lucene's input?  IOW how does Lucene, as used in DSpace, locate
 and gain access to the files it indexes?  If it doesn't go through
 the DSpace storage layer or something equivalent then indexing is
 screwed.
 
 Ouch!  I hadn't thought about these at all.
 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] srb/s3/etc and lucene

2007-05-03 Thread Cory Snavely
Well, I'm just wondering, in specific terms, if we use an object-based
storage system as an assetstore rather than a filesystem, where the
files that Lucene indexes actually sit.

It's my understanding that in a filesystem-based assetstore, for
example, text is extracted from PDFs and stored in a separate file
*within the assetstore directory* that Lucene crawls. I just don't know
how that sort of thing is handled when using object-based storage.

On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote:
 Hi Cory:
 
 Not sure about the limits of Lucene, but I think the larger point is
 that the back-ends are expected only to hold the real content or assets.
 Everything else (full-text indices and the like) are *artifacts* (can be
 recreated from the assets) that we don't need to manage in the same way.
 If for performance reasons we want to put them where the assets are we
 can, but there is really no connection between the two that the system
 imposes. 
 
 Does this get at your question, or did I miss the point?
 
 Thanks,
 
 Richard R
 
 On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote:
  (Apologies if this has been discussed to resolution; after a few
  attempts to search the archives, I concluded they are really broken. 500
  errors, bad links, etc.)
  
  For those using, interested in, or knowledgeable about using API-based
  storage (SRB, S3) as a backend for DSpace: how does doing so affect
  full-text indexing? Can anyone describe how, in such a setup, full text
  is stored and indexed?
  
  My uneducated impression is that Lucene would want to work only against
  a filesystem.
  
  Thanks,
  Cory Snavely
  University of Michigan Library IT Core Services
  
  
  
  -
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-19 Thread Cory Snavely
Generally what's going on is that Tomcat, the web application framework,
has a large virtual machine running with a substantial amount of memory
allocated to the caching of programs and data for performance.

Depending on your database configuration, there can also be a
substantial amount of allocation to cache in Postgres too.

The indexer is a periodic process that does not run constantly. You
still must account for the amount of memory it consumes while running.
Memory requirements for recent versions of the indexing routine are of
constant order, meaning they do not vary appreciably with repository
size.

On Wed, 2007-04-18 at 18:09 -0700, Pan Family wrote:
 Thank you all for giving your opinion!
 
 Technically, is it the web application or the indexer that requires 
 most of the memory?  What data is kept in memory all the time
 (even when nobody is searching)?  Is the memory usage proportional
 to the number of concurrent sessions?
 
 Thanks again,
 
 Pan
 
 
 
 
 On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote:
 Well, as I said at first, it all depends on your definition of
 what a
 memory hog is. Today's hog fits in tomorrow's pocket. We
 better all
 already be used to that.
 
 Also, I don't think for a *minute* that the original
 developers of 
 DSpace made a casual choice about their development
 environment--in
 fact, I think they made a responsible choice given the
 alternatives.
 Let's give our colleagues credit that's due. Their choice
 permits
 scaling and fits well for an open-source project. Putting the
 general
 problem of memory bloat in their laps seems pretty angsty to
 me.
 
 Lastly, dedicating a server to DSpace is a choice, not a
 necessity. We
 as implementors have complete freedom to separate out the
 database and 
 storage tiers, and mechanisms exist for scaling Tomcat
 horizontally as
 well. In the other direction, I suspect people are running
 DSpace on
 VMware or xen virtual machines, too.
 
 Cory Snavely
 University of Michigan Library IT Core Services 
 
 On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
  Pan,
 
  Dspace is a memory hog considering the functionality the
 application
  provides.  This is mainly due to the technological choices
 made by the 
  founders of the Dspace project, and not the functional
 requirements the
  Dspace project fulfills.
 
  Application and memory bloat are pervasive in the IT
 industry.  Each
  individual organization should look at their requirements
 whether they 
  are hardware, software or both.  Having to dedicate a
 machine to an
  application, especially a relatively simple application like
 Dspace, is
  wasteful for hardware resources and people resources.
 
  Web applications should _not_ need 2G of memory to run
 comfortably.
 
 
 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Cannot get a connection, pool exhausted

2007-04-18 Thread Cory Snavely
In our experience, this problem appears to be due to a bug somewhere in
freeing connections back to the pool--we tend to see steady linear
growth in the number of 'idle in transaction' connections until we get
this error. These are visible with ps.

Increasing the number of connections in the pool, for us, only delayed
the occurrence of the problem. Ultimately the number of 'idle in
transaction' connections would climb to the max.

We put a workaround in place. This is a root crontab entry:

# kill old 'idle in transaction' postgres processes, leaving up to 10
* * * * * while /usr/bin/test `/usr/bin/pgrep -f 'idle in transaction'
| /usr/bin/wc -l` -gt 10; do /usr/bin/pkill -o -f 'idle in transaction';
done

At one point I was entertaining a theory that the Apache connection pool
manager delivered with DSpace was a stale version. To date, the
workaround has worked so well that I'm not sure that theory has been
fully explored.

Also, FWIW, there have been lengthy discussions on this list about this
topic already. You would probably find the previous thread useful as I'm
quite sure I'm not retelling everything here.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 12:13 +0530, Filbert Minj wrote:
 Hi Stuart,
 
 Thanks very much for the prompt reply.
 Recently we have upgraded it to Dspace 1.4.1 on RHEL 4 using postgres 
 database.
 I made the change in db.maxconnections and I think this should solve the 
 problem.
 
 I had forgotten, earlier we had the same problem and did exactly what you 
 suggested.
 
 Cheers,
 
 --
 Filbert
 
 - Original Message - 
 From: Stuart Lewis [sdl] [EMAIL PROTECTED]
 To: Filbert Minj [EMAIL PROTECTED]; 
 dspace-tech@lists.sourceforge.net
 Sent: Wednesday, April 18, 2007 11:32 AM
 Subject: Re: [Dspace-tech] Cannot get a connection, pool exhausted
 
 
  Hi Filbert,
 
  Has any one faced similar problem.
 
   WARN  org.dspace.app.webui.servlet.DSpaceServlet @
  anonymous:no_context:database_error:org.apache.commons.dbcp.SQLNestedException
  :
  Cannot get a connection, pool exhausted
 
  What is solution of this problem.
 
  DSpace holds a 'pool' of connections to the database which it reuses. This
  means it doesn't have the overhead of creating a connection to the 
  database
  each time it needs to talk to the database.
 
  The error message suggests that all of these connections are in use, and 
  it
  has reached the number of connections that you have said it can have. The
  default set in [connections]/config/dspace.cfg is:
 
  db.maxconnections = 30
 
  There are two reasons that you might be reaching this limit -
 
  1) Your DSpace is very busy (lots of visitors) and there are not enough
  connections to cope. If your hardware is large enough to cope with number 
  of
  connections, you could think about increasing the number of connections in
  the pool. (change the number, restart Tomcat).
 
  2) For some reason, DSpace might not be letting go of some old 
  connections,
  or they might be stuck in some way. If you are using UNIX and postgres, 
  you
  should be able to see the connections, and what they are doing, by running 
  a
  'ps' on them  (make sure you're screen is wide to see what comes at the 
  end
  of the line). This might show that the connections are stuck - typical 
  state
  might be 'idle in transaction'. This can also happen if connections to the
  database are not closed properly by DSpace.
 
  Which version / operating system / database do you use?
 
  I hope this helps,
 
 
  Stuart
  _
 
  Datblygydd Cymwysiadau'r WeWeb Applications Developer
  Gwasanaethau Gwybodaeth  Information Services
  Prifysgol Cymru Aberystwyth   University of Wales Aberystwyth
 
 E-bost / E-mail: [EMAIL PROTECTED]
  Ffon / Tel: (01970) 622860
  _
 
 
  -- 
  This message has been scanned for viruses and
  dangerous content by MailScanner, and is
  believed to be clean.
  
 
 
 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https

Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Cory Snavely
This depends on your definition of a memory hog.

We run a relatively large instance of DSpace and we allocate 512MB to
Tomcat, about 100MB to Postgres, and 256MB for daily indexing runs (via
the dsrun script).

In earlier versions of DSpace the indexing routine needed to be patched
to work around a poor implementation that caused memory allocation to be
linear with repository size. Without that, we were running out of memory
during indexing. I believe that patch is now part of the base.

We run comfortably inside 2G of physical memory. I may have considered
that a memory hog 5 years ago, but today I consider it light.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 01:01 -0700, Pan Family wrote:
 Hi,
 
 There is a rumor that says DSpace is a memory hog.
 I don't know where this is from but it may not be that
 important.  What is important is that it makes my
 management nerves.  So I'd like to hear from those
 who know anything about this issue.  Is it really
 a memory hog?  Under what circumstances it
 might become a memory hog?  Or there should
 be no worry about memory usage at all?
 
 Thanks a lot in advance!
 
 -Pan 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Cory Snavely
Well, as I said at first, it all depends on your definition of what a
memory hog is. Today's hog fits in tomorrow's pocket. We better all
already be used to that.

Also, I don't think for a *minute* that the original developers of
DSpace made a casual choice about their development environment--in
fact, I think they made a responsible choice given the alternatives.
Let's give our colleagues credit that's due. Their choice permits
scaling and fits well for an open-source project. Putting the general
problem of memory bloat in their laps seems pretty angsty to me.

Lastly, dedicating a server to DSpace is a choice, not a necessity. We
as implementors have complete freedom to separate out the database and
storage tiers, and mechanisms exist for scaling Tomcat horizontally as
well. In the other direction, I suspect people are running DSpace on
VMware or xen virtual machines, too.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
 Pan,
 
 Dspace is a memory hog considering the functionality the application
 provides.  This is mainly due to the technological choices made by the
 founders of the Dspace project, and not the functional requirements the
 Dspace project fulfills.
 
 Application and memory bloat are pervasive in the IT industry.  Each
 individual organization should look at their requirements whether they
 are hardware, software or both.  Having to dedicate a machine to an
 application, especially a relatively simple application like Dspace, is
 wasteful for hardware resources and people resources.
 
 Web applications should _not_ need 2G of memory to run comfortably.
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Large files and DSpace

2007-04-16 Thread Cory Snavely
I'd be interested to know how using SRB addresses the problem, which I 
understand to be the logistics of handling such a large file in both the 
user interface and the back end. Does it?

Cory Snavely
University of Michigan Library IT Core Services

- Original Message - 
From: Ekaterina Pechekhonova [EMAIL PROTECTED]
To: Gary Browne [EMAIL PROTECTED]
Cc: dspace-tech@lists.sourceforge.net
Sent: Monday, April 16, 2007 8:12 PM
Subject: Re: [Dspace-tech] Large files and DSpace


 Hi Gary,
 you can configure Dspace to use SRB instead of regular assetstore. Some 
 basic information can be found in the docs which come
 with Dspace.Also you can check this link:
 http://wiki.dspace.org/index.php//DspaceSrbIntegration

 Kate

 Ekaterina Pechekhonova
 Digital Library Programmer/Analyst
 New York University
 Libraries
 email: [EMAIL PROTECTED]
 phone: 212-992-9993

 - Original Message -
 From: Gary Browne [EMAIL PROTECTED]
 Date: Monday, April 16, 2007 7:41 pm
 Subject: [Dspace-tech] Large files and DSpace
 To: dspace-tech@lists.sourceforge.net

 Hello All



 I think I posted a question like this last year but I've just become a
 dad for the first time and have a bit of brain meltdown. I tried
 searching for answers on the annoying sourceforge list archive (should
 I
 start a separate thread about this...?) but didn't find much.



 My question is a general one in that I'm wondering how people are
 handling large files in DSpace (getting them onto the server,
 submissions and publication/access)? Is the SymLink stuff the only
 option at this point? For example, we have (and will be getting lots
 more of) a 12GB video file to be used in one of our collections. I'd
 like to nut out what the possible options are before I try anything.



 Thanks and kind regards

 Gary





 Gary Browne
 Development Programmer
 Library IT Services
 University of Sydney
 Australia
 ph: 61-2-9351 5946



 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Assetstore physical storage

2007-04-11 Thread Cory Snavely
There's a whole discussioon there about what's the right tool for the
job, but integration with Lucene would be my guess as to the practical
reason. I'd be interested to learn if that, in fact, were not a
constraint.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-11 at 11:30 -0700, Ryan Ordway wrote:
 Is there a reason why only the metadata is stored in the database and not
 the actual assetstore bitstreams? Has anyone considered changing the
 physical storage from the filesystem to the database? I'm working on
 building some redundancy into my infrastructure and it's looking like the
 most efficient way to store the assetstore data in clustered configurations
 would be in the database, especially when your database is already clustered
 across multiple systems. Your database gets much larger, but you don't have
 to worry about keeping your assetstores synchronized, etc.
 
 Any thoughts? Anyone to blame? ;-)
 
 Ryan
 


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] redirect port 8443 to 80?

2007-04-09 Thread Cory Snavely
Right, and that was my initial approach, but it seemed to have the
effect of blocking traffic to port 80.

As I've said, I'm not seeing it as a real problem, but rather just
letting people know that it is an ugliness associated with this (NAT)
approach.

On Sat, 2007-04-07 at 12:26 -0400, Mark Diggory wrote:
 On Apr 7, 2007, at 12:08 PM, Mark H. Wood wrote:
 
  On Fri, Apr 06, 2007 at 12:07:44PM -0400, Cory Snavely wrote:
  For folks listening in with interest, we also use NAT port  
  forwarding to
  get around the requirement for mod_jk, but FWIW I haven't  
  determined a
  way to close the incoming *actual* Tomcat ports (8080/8443).
 
  Just don't open them.  In [tomcat]conf/server.xml comment out the
  Connector with 'port=8080' and leave commented the one with
  'port=8443'.  You should then only be running AJP 1.3 on 8009 and
  the shutdown port on localhost:8005.  If you want to limit AJP to the
  local host, you can add 'address=127.0.0.1' to the AJP Connector.
 
  -- 
  Mark H. Wood, Lead System Programmer   [EMAIL PROTECTED]
  Typically when a software vendor says that a product is intuitive he
  means the exact opposite.
 
 MarkW,
 
 This would only be the case if they were using mod_jk/Apache. but,  
 they are trying to use NAT/port forwarding and this means those  
 Tomcat ports are what are getting forwarded to. I'd say the quickest  
 solution is to just block those ports from external requests in the  
 NAT/firewall configuration.
 
 -Mark Diggory
 
 ~
 Mark R. Diggory - DSpace Systems Manager
 MIT Libraries, Systems and Technology Services
 Massachusetts Institute of Technology
 
 
 
 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys-and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] redirect port 8443 to 80?

2007-04-06 Thread Cory Snavely
For folks listening in with interest, we also use NAT port forwarding to
get around the requirement for mod_jk, but FWIW I haven't determined a
way to close the incoming *actual* Tomcat ports (8080/8443). So, a
potential downside with this approach, in addition to not having any
real logic like mod_rewrite to apply at that intermediary level.

Mind you, it's not really harmful or vulnerable, it's just a little ugly
to have your actual nonstandard ports all hanging out like that.

Cory Snavely
University of Michigan Library IT Core Services

On Fri, 2007-04-06 at 11:56 -0400, Mark Diggory wrote:
 We use Apache, mod_jk and mod_rewrite to deliver the webapplication  
 on port 80 and port 443 as separate VirtualHost entries in Apache  
 httpd. We do not allow direct access to the tomcat server over port  
 8080 or port 8443.  I can send some more detail of our configuration  
 if you decide to go this route.
 
 -Mark
 
 On Apr 6, 2007, at 11:32 AM, James Rutherford wrote:
 
  On Thu, Apr 05, 2007 at 09:39:53AM -0600, Zhiwu Xie wrote:
  bar, but then when I click the DSpace logo from a secured page  
  such as
 
  https://laii-dspace.unm.edu/password-login
 
  all the following pages are through https regardless of which the  
  page
  is, which bothers me.
 
  The links used in DSpace are relative, so if you login via https, you
  will continue with https.
 
  But when I tried to click the dspace logo from the mit dspace page
 
  https://dspace.mit.edu/password-login
 
  the request to the https://dspace.mit.edu/ seems to be rerouted to
  http://dspace.mit.edu/. So what's the trick?
 
  The only reason the MIT site is different is because (I assume) they
  have some custom configuration elsewhere that redirects https requests
  to http for normal use. If you try accessing https://dspace.mit.edu  
  you
  will be redirected to the unsecured version at http://dspace.mit.edu.
 
  cheers,
 
  Jim
 
  -- 
  James Rutherford  |  Hewlett-Packard Limited registered  
  Office:
  Research Engineer |  Cain Road,
  HP Labs   |  Bracknell,
  Bristol, UK   |  Berks
  +44 117 312 7066  |  RG12 1HN.
  [EMAIL PROTECTED]   |  Registered No: 690597 England
 
  The contents of this message and any attachments to it are  
  confidential and
  may be legally privileged. If you have received this message in  
  error, you
  should delete it from your system immediately and advise the  
  sender. To any
  recipient of this message within HP, unless otherwise stated you  
  should
  consider this message and attachments as HP CONFIDENTIAL.
 
  -- 
  ---
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to  
  share your
  opinions on IT  business topics through brief surveys-and earn cash
  http://www.techsay.com/default.php? 
  page=join.phpp=sourceforgeCID=DEVDEV
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
 
 ~
 Mark R. Diggory - DSpace Systems Manager
 MIT Libraries, Systems and Technology Services
 Massachusetts Institute of Technology
 Office: E25-131
 Phone: (617) 253-1096
 
 
 
 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys-and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Data integrity/preservation issues and mirroring development-production servers

2007-02-20 Thread Cory Snavely
This illustrates the importance of NOT confusing *replication* for
redundancy, whether that be rsync, LOCKSS, something SAN-based, etc,
with *backups* for version retention, whether that be conventional
weekly-full/daily-incr, snapshots, CDP, etc.

(It also illustrates the importance of validating checksums regularly!)

This is the kind of thing Mark was getting at. SDR guidelines and good
preservation policies should require redundancy for availability and/or
disaster recovery, checksums (and periodic validation!) for integrity
purposes, and backups for protection against human error and/or for
disaster recovery. HOWEVER, implementing those things in a way that
serves their preservation goals requires a sysadmin who understands
those preservation goals. For example, ideally, backup or snapshot
retention would be at least twice as long as the frequency with which
checksums are validated, so that if a validation error is detected, you
have at least two previous copies to go back to.

Ultimately there is a level of detail below which local decisions on
implementation are irrelevant--for example, the architecture of the
backup system--but without some understanding of the preservation goals,
a sysadmin is not guaranteed to make the right decision.

Cory Snavely
University of Michigan Library IT Core Services

On Tue, 2007-02-20 at 09:30 +, Philip Adams wrote:
 Hi,
 
  
 
 Checksums may be reassuring for checking that a file still has
 integrity, but they leave open the question of what to do if the
 checksums do not match. 
 
  
 
 There is a growing movement of people interested in trying to ensure
 that digital preservation techniques exist to overcome this problem.
 One of the most interesting applications to come out of this is LOCKSS
 (Lots of Copies Keeps Stuff Safe) see
 http://www.lockss.org/lockss/Home for details.
 
  
 
 Most of the material archived using LOCKSS so far is from electronic
 journals, with some government papers and the odd blog. LOCKSS acts as
 a store, a proxy and a repairer. If applied to DSpace, it could enable
 a kind of co-operative backup network to develop with copies of
 content from repositories mirrored on a number of LOCKSS boxes. If
 your DSpace was unable to deliver content it could be served up from
 LOCKSS acting as a proxy instead. LOCKSS boxes spend much of their
 time contacting each other to take part in integrity checking polls
 and repairing content where required.
 
  
 
 There is a recent survey of the digital preservation strategies
 available at the moment at
 http://www.clir.org/pubs/reports/pub138/pub138.pdf. De Montfort
 University is taking part in the UK LOCKSS Pilot programme:
 http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/programme_lockss.aspx.
 
  
 
 Perhaps repository owners could use LOCKSS in either public or private
 networks to look after the digital preservation aspects of managing
 their content.
 
  
 
 Regards,
 
 Philip Adams
 
 Senior Assistant Librarian (Electronic Services Development)
 
 De Montfort University Library
 
 0116 250 6397
 
  
 
 
 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys-and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] How to configure Postfix...??

2007-02-15 Thread Cory Snavely
sendmail is one of the most arcane Unix systems known to exist. It is
also extrmely popular and ubiquitous. Choose it if you want to impress
your nerdy friends.

postfix is much simpler to configure. Nobody could possibly disagree
with that.

There are others. Debian systems install with exim, for example.

As other have mentioned, the distro you choose should give you a working
MTA configuration out of the box, and you probably don't even need to
know what it is. Your first order of business should be finding that
feature and employing it.

Cory Snavely
University of Michigan Library IT Core Services

On Fri, 2007-02-16 at 00:20 +0530, Sahil Dave wrote:
 well i have never configured any MTA bfore .. so i needed some good
 info..
 which do u think is more supported ... sendmail or postfix???
 
 
 On 2/15/07, James Rutherford [EMAIL PROTECTED] wrote:
 apologies for sending this twice. in future, make sure you
 'reply-all'
 on the mailing list emails so that your responses go back to
 the list.
 
 cheers,
 
 jim.
 
 On 15/02/07, James Rutherford  [EMAIL PROTECTED] wrote:
  On 14/02/07, Sahil Dave [EMAIL PROTECTED] wrote:
   yes i am running Mandriva 2007.. but i need to deploy
 Dspace on RHEL  4 - ES 
   in my Library...
   what all changes do i need to make to the postfix  DSpace
 config.
   files??
 
  RHEL4 will probably have sendmail setup and configured
 already. You 
  can check to see if it is by running (as root) lsof -i
 tcp:25
 
  you should see something like the following if it is
 running:
 
  [EMAIL PROTECTED] ~]# lsof -i tcp:25
  COMMAND   PID USER   FD   TYPE DEVICE SIZE NODE NAME 
  sendmail 2995 root3u  IPv4   6365   TCP
  localhost.localdomain:smtp (LISTEN)
 
  If this is the case, you just need to configure the mail
 server in
  your dspace.cfg to be localhost, and add the username and
 password as 
  required for the sendmail configuration. Note that if you're
 running
  sendmail purely for your DSpace repository, you should
 configure your
  firewall to block external connections to port 25 to avoid
 being used 
  as a relay.
 
  There is nothing special about DSpace SMTP requirements, so
 for
  whichever software you use, you should be able to find ample
  documentation and sample configuration files. I'm afraid I
 don't 
  really know much about postfix, but I do know that it is a
  well-documented project, so you should have no problems
 using it if
  you really want to.
 
  Jim.
 
 
 
 -- 
 Sahil
 MCA(SE)
 USIT 
 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys-and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] need some suggestions plzzzzz

2007-02-13 Thread Cory Snavely
We looked into using a single naming authority for items in DSpace and
not in DSpace, and it's problematic because DSpace essentially has
naming authority for submitted items. It would be difficult to predict
its naming and work around it.

So we have a main naming authority and a DSpace sub-naming authority off
that. It's no big deal.

If you were really really tied to having one, you could in theory create
handles that were pointers into DSpace either using the DSpace handle
resolution mechanism, or not. Note that you would have to customize the
link generation in DSpace where it provides a bookmarkable URL to the
user. I'm not sure how you would tell DSpace what the externally-created
identifier is, though. It sounds messy.

In my estimation, it's much easier to accept the fact that DSpace is a
relatively self-contained system that creates and resolves its own
identifiers.

Cory Snavely
University of Michigan Library IT Core Services

On Tue, 2007-02-13 at 10:05 -0600, Krishna wrote:
 Hello everyone,
 
 I need some suggestions. We are trying to integrate DSpace to a system
 which already uses handle system. If we want to use DSpace to store
 the data which also uses internal handle system, how do we do it. we
 would like to use only the handles which we already have and not the
 handles that DSpace uses . Is there any place in DSpace(may be
 metadata) to store the handle identifier generated by our system and
 use these handles to retrieve the data from the DSpace repository.
 Thanking you all'
 
 Krishna
 
 -
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier.
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] connections to db seem to be getting stuck

2007-01-19 Thread Cory Snavely
Note that this error is not referring to the Postgres connections
themselves, but the connection pool within DSpace from which the
database connections are allocated. Postgres is blissfully ignorant of
the problem, and I believe we'd see this problem even if we tripled the
number of connections.

At one point we did see the number of Postgres connections being
exhausted because I hadn't done the math for how many DSpace instances
we're running and configured Postgres accordingly, but as soon as I
tweaked that up to account for that, that problem went away.

What we are observing now is much more like a database connection pool
leak of some kind. Little by little, apparently after aggressive hits,
Postgres connections go into a permanent idle in transaction state,
and eventually all of the pool is used up. A restart of Tomcat or
Postgres will free the connections.

Apparently idle in transaction is Postgres waiting on the client
mid-transaction. We don't seem to see hangs on database activity
manifested in the web interface, which makes me suspect there is not a
problem with queries completing successfully but rather something more
insidious in how the pool is managed--maybe the idle in transaction
state is caused due to some sort of race condition as an active
connection in the pool is assigned to another running thread.

For the moment, I have installed a dirty little crontab entry that runs
this on the minute:

/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \
  /usr/bin/wc -l ` -gt 20  /usr/bin/pkill -o -f 'idle in transaction'

In English: every minute, if there are more than 20 idle in
transaction Postgres processes, it kills the oldest one.

Cory Snavely
University of Michigan Library IT Core Services

On Fri, 2007-01-19 at 11:58 -0500, Mark Diggory wrote:
 What about postgres? How many connections is it making available?  
 You'll want to roughly multiply it by the number of webapplications  
 your running, so for instance
 
 db.maxconnections = 50
 db.maxwait = 5000
 db.maxidle = 5
 
 running dspace.war, dspace-oai.war and dspace-srw.war postgres needs  
 about 150 connections in it postgres.conf.  I usually increment that  
 by one for cron jobs as well:
 
 
 for instance in my current config we run two virtual hosts with 3  
 webapps each and 1 set for crons:
 
 2 vhosts *( 3 webservices +1 cron) * 50 in pool = 400
 
  #- 
  --
  # CONNECTIONS AND AUTHENTICATION
  #- 
  --
 
  max_connections = 400
  # note: increasing max_connections costs ~400 bytes of shared  
  memory per
  # connection slot, plus lock space (see  
  max_locks_per_transaction).  You
  # might also need to raise shared_buffers to support more connections.
 
 Its not a hard-fast rule, we never really exhaust that many  
 connections in one instance, but somewhere between that and the  
 default 100 there is a sweet spot.
 
 -Mark
 
 On Jan 19, 2007, at 11:43 AM, Jose Blanco wrote:
 
  Actually I mean, more frequently today.  Sorry about that.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of  
  Jose Blanco
  Sent: Friday, January 19, 2007 11:42 AM
  To: 'Dorothea Salo'
  Cc: dspace-tech@lists.sourceforge.net
  Subject: Re: [Dspace-tech] connections to db seem to be getting stuck
 
  It was dying on us a couple of times a week, but for some reason,  
  it's dying
  more frequently this week.  Could you share your config db parameters.
  Right now I have the default settings.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of  
  Dorothea
  Salo
  Sent: Friday, January 19, 2007 11:28 AM
  Cc: dspace-tech@lists.sourceforge.net
  Subject: Re: [Dspace-tech] connections to db seem to be getting stuck
 
  Jose Blanco wrote:
  So what do you do?  Restart tomcat all day long?  For some reason,  
  it is
  happening very frequently today.  It's making the system kind of  
  unusable
  when every 30 minutes to an hour tomcat has to be restarted.
 
  That often? Wow. It dies on us a couple of times a week, and not
  always for
  this reason as best I can tell.
 
  It's a bit comforting to know it's not just my problem.  Will you  
  be at
  the
  Open repository conference in San Antonio next week?  I'll be  
  there, and
  hope we can get some help on this.
 
  Agreed! And yes, I will be there.
 
  Dorothea
 
  -- 
  Dorothea Salo, Digital Repository Services Librarian
  (703)993-3742 [EMAIL PROTECTED] AIM: gmumars
  MSN 2FL, Fenwick Library
  George Mason University
  4400 University Drive, Fairfax VA 22031
 
  -- 
  ---
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to  
  share your
  opinions on IT  business topics through brief surveys