Re: [Dspace-tech] Publish Faculty Articles in DSPACE instance
Hi Monika: A few remarks inline. Thanks, Richard On May 14, 2015, at 5:30 PM, Monika C. Mevenkamp moni...@princeton.edu wrote: Princeton University has had an Open Access mandate in place since fall of 2011. Slowly but surely we are getting to the point of following through by collecting articles in the form of citation metadata and article documents. At the moment we hope to work with Symplectic for citation discovery. Staff will upload documents on behalf of authors. Citation metadata and documents will be pushed through to a DSPACE repository, where the articles will become accessible to everybody. We have to plan our process, such that we ask as little as possible from facility authors. So first off - is there anybody out there using Symplectic ? If so, please tell me about your experience. Second: Do you know of a DSPACE instance dedicated to scholarly publications ? Harvard’s DASH is a dedicated DSpace instance - http://dash.harvard.edu/ for scholarly publications, I’m sure there are others. At MIT (http://dspace.mit.edu/), our OA articles are a collection in a larger repository. In addition, there are several questions, we need to answer, and I would like to learn from other people / institutions, who went through this already. Content organizations At one extreme we could simply have a giant community with one giant collection, where everything goes. Personally. I lean towards organizing articles by department, which we expect to have available as one of the metadata fields. Articles with authors from different departments would be listed in two collections. What are your goals/needs for discovery, branding, etc? That would help settle how content is modeled in the hierarchy. Workflow Once article metadata and a document is ingested, it enters a workflow. Where necessary, a staff member adds wording requested by publishers to accompany the publication on the web. If the document is deemed unacceptable (see Document formats), a new better document needs to be found and uploaded Once all is ready, we plan to send an email to inform author/s of the pending publication in our repository, Preferably that email should contain a link to the item, as it will look once published. At a minimum it should include a link to the article bitstream, so authors can properly review. This will pose an interesting challenge, since items in workflow are not generally visible to those without specific permissions to edit, etc. Since yours is a ‘proxied’/mediated model (“Staff will upload documents..'’), the authors would not be so empowered by default. If the data is coming from Symplectic, why not have that be the review site? (Sending the email is fairly straightforward) The item becomes public, unless the author replies within a specified time indicating that publication should be ‘aborted’. I have not really worked with the DSPACE workflow system; my guess is, that I’ll have to do some custom coding. Right ? Not necessarily - it really depends on the nature of the workflow customizations. For certain versions/configurations of workflow, one can assign curation tasks to operate at any step. For instance, we have OA collection tasks that call CrossRef web services to do enhanced cataloging (authors in order, preferred by Google Scholar, e.g), check for duplicate detection by comparing the DOI with all those already in the repository, copy MIT authors (cataloged separately) into the ‘regular’ author fields, virus checking, etc. If these tasks are already written, you need only ‘wire them in’. Document formats: Which document formats should we allow ? PDF, PDF/A, others ? This might depend on the specifics of the OA policy - ours allows the ‘final published version’ in some cases (which is PDF or PDF/A generally), but often the author’s final manuscript, which might in some disciplines be other than PDF. How can we validate formats ? How to virusscan documents ? Is this done in a cronjob, integrated in the workflow ? The word on the street seems to be: 'do not do automatic format migration’. Is that the consensus ? Once we have content in the repository, it will make sense to offer a couple XML, json access points, so it is easy to list articles by department, by author, .. or access individual articles … I expect that referrers, for example department websites, will often want to take the query result and format them as HTML themselves as opposed to linking to DSPACE item pages. Will the REST interface do the trick ? Many questions, … - so thank you in advance for any and all answers Monika Monika Mevenkamp phone: 609-258-4161 Princeton University, Princeton, NJ 08544 -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring
Re: [Dspace-tech] Memory Usage Traversing A Very Large Collection via Curation Task
Hi Terry: I’m away from the office with limited network, but based on a quick look at the code, I have a few suggestions. You don’t indicate how you run this task, but I’m assuming that for the large collections at least, you are not doing it in the admin UI but with the command-line tool. It has a switch ‘-l’ for limiting the number of objects in a context. I’d recommend using a value like -l 100 to -l 1000, which should ensure that the context is cleared before getting too large. (Check the doc for command-line usage). I would also remove all the Context management in the task code itself (READ-ONLY or otherwise) - I ‘m not sure it’s needed and indeed may a source of memory problems. Wherever you need a context reference (e..g authorizeActionBoolean(context, item, Constants.READ, false)), use the API call Curator.curationContext() instead. This will insure that the same context object is reused each time. You should not have to allocate any contexts yourself. Please report if you still encounter problems, and I’ll look more thoroughly, Thanks, Richard On Dec 16, 2014, at 2:49 PM, Terry Brady terry.br...@georgetown.edu wrote: I am experimenting with the Curation System. I have written a task to crawl a collection/community and identify specific items that are exception cases (restricted items, multiple bitstreams, non-standard bitstream type). https://gist.github.com/terrywbrady/24f6ddf24d9026149aff The process is working well for me, but I encounter memory/heap/garbage collection exceptions when I attempt to process my largest collection. That collection contains 150,000 items. I have discovered that I need to crank up the memory and turn on incremental garbage collection in order to get the process to complete. export JAVA_OPTS=-Xmx3000m -Xincgc Since I am simply processing items in a read-only fashion, I am surprised that I have needed these settings in order to process my collection. Can you recommend a more efficient way to traverse the collection and the items? CONTEXT INITIALIZATION I attempted to set the READ_ONLY option to prevent result caching. Context context = new Context(Context.READ_ONLY); ITEM TRAVERSAL ItemIterator iter = ((Collection)dso).getAllItems(); while (iter.hasNext()) { performObject(iter.next()); } iter.close(); ITEM ACCESS CHECK if (!AuthorizeManager.authorizeActionBoolean(context, item, Constants.READ, false)) { BUNDLE/BITSTREAM TRAVERSAL boolean hasAnon = true; for (Bundle bundle : item.getBundles(ORIGINAL)) { for (Bitstream bs : bundle.getBitstreams()) { count++; String type = bs.getFormat().getMIMEType(); if (isStandardMimeType(type)) { } else { errtype = type; unsuppType++; } hasAnon = hasAnon AuthorizeManager.authorizeActionBoolean(context, bs, Constants.READ, false); } } Does this sound like an issue that should be submitted as a bug report? I am running DSpace 4.2. Thanks, Terry -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette smime.p7s Description: S/MIME cryptographic signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] File Dates and Replication Task Suite
Hi Nathan: Not sure exactly your process, but the suppression of certain time-stamps (in the .zip archive itself, e.g) is deliberate: if archive-time was embedded, then 2 AIPs of the same content would always differ (just by that time-stamp), so their checksums would be different. That would mean that one could not easily decide that 2 AIPs represent unchanged content. The task suite does a lot of check-sum comparisons for this purpose. Why do you need the file dates? Thanks, Richard On Aug 27, 2014, at 12:58 PM, Nathan Tallman wrote: DSpace users, My institution is on DSPace 1.8.2 and has installed the replication task suite to create bags. However, an odd thing occurs when creating bags. All files get a date of January 1, 1980. We exporting AIPs using packager, original file dates are retained. Has anyone run into this before? Any suggestions? Thanks, Nathan Nathan Tallman University of Cincinnati Libraries Digital Collections and Repositories -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] OpenSearch strange results...
Hi Charlene: It's not a bug, just a confusing name coincidence: you might assume that the query variable 'scope' expects the same values in discovery and opensearch, but it doesn't. In the opensearch case, use: scope=1721.1/123 i.e. the handle of the community or collection you wish to restrict the search results to. Hope this helps, Richard R On Apr 15, 2014, at 1:06 PM, Charlene Chinda Barina wrote: Hi all, Wanted to see if anyone else had this issue, where if you have a query that has a scope variable, like so: http://impactsurvey.org/ccn/discover?scope=%2Fquery=author%3A%22Casa+Latina%22+AND+language%3A%22Spanish%22submit=Go and add the opensearch option: http://impactsurvey.org/ccn/opensearch/discover?scope=%2Fquery=author%3A%22Casa+Latina%22+AND+language%3A%22Spanish%22submit=Go You get: java.lang.IllegalArgumentException: Scope handle / should point to a valid Community or Collection Without the scope=%2F argument, it works as expected. Is this a bug or some such I should report? I'm trying to give people an option to grab a saved query as a feed, and would want to minimize steps for them to do so. -- Charlene Barina, MPH Research Analyst 2, U.S. IMPACT Study The Information School 303-359-6347 | Skype: cbarina facebook.com/ImpactSurveyhttp://facebook.com/ImpactSurvey | twitter.com/impactsurveyhttp://twitter.com/impactsurvey -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Exporting Bags
Hi Nathan: Not sure I have one lying around, but the bag contents are pretty simple: All bags have a payload file 'object.properties' that contains the object type (Item, collection, community), the handle, and the parent handle. They also have a payload file called 'metadata.xml' that has whatever metadata the object has: (for communities, that would be the name, short description, copyright text, etc, for items, it would be the DC (or other) metadata). For items, under the 'data' directory in the bag are subdirectories for each DSpace bundle (name = bundle name), In these, there is a pair or files for each bitstream: the bitstream itself (named by sequence number) and the bitstream metadata file. Thus, for 2 bitstreams, directory contents would be: 1 1-metadata.xml 2 2-metdata.xml I think the big question is what the preservation repository requires (beyond bagit formatted packages). It could expect specifics, or not. Hope this helps, Richard R On Mar 22, 2014, at 3:19 PM, Nathan Tallman wrote: We're running DSpace 1.8x and need to export collections as bags (per the Bagit spec) to ingest into a preservation repository. From reading the Replication Task Suite documentation, this should be possible. However, before we install and configure RTS, I'd really like to see an example output bag. Would anyone be willing to share? Other options to get content into a bag is to export as SAF or AIP and use scripts to convert into bags. If anyone has done this before, I would very much appreciate learning more about your workflow. Many thanks, Nathan -- Nathan Tallman Digital Content Strategist University of Cincinnati Libraries PO Box 210033 Cincinnati, Ohio 45221-0033 (513) 556-5740 nathan.tall...@uc.edumailto:nathan.tall...@uc.edu http://digitalprojects.libraries.uc.edu/ -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] DSpace and ePub support
Hi Rodrigo: Question 1: Yes, you can upload EPUB files. Whether DSpace will automatically recognize the format depends on: * whether you have created a new EPUB format in your bitstream format registry (it is not present by default). This is a standard administrative operation (see docs). I believe the accepted mime-type associated with EPUB container files is 'application/epub+zip' * whether the EPUB files you want to upload have extensions (like '.ePub') that match those you defined in the format. DSpace does simple file extension mapping to determine format type. For example, if you had extensions 'EPUB', 'ePub', and no other formats had those extensions, then all content will we assigned to the the EPUB format that bore those file extensions. Question 2: Yes, users should be able to download and view these files - *provided* they have client-side rendering tools for EPUBs. DSpace is no different in this regard than BarnesNoble or any other EPUB provider - it merely supplies the file. Support varies in tablets for commercial reasons, but generally is pretty good. Hope this helps, Richard R On Dec 20, 2013, at 11:31 AM, Calloni, Rodrigo wrote: Hello We are using DSpace 1.8 XMLUI. I want to know about the DSpace support for ePub files. Are there any news about this? While searching this topic in Google I found the following tread posted last year: http://dspace.2283337.n4.nabble.com/epub-td4655710.html So I have 2 basic questions: 1) Can we upload ePubs in DSpace? Will it recognize the file format? 2) Are users capable of downloading and opening an ePub that was added to DSpace (for example, adding it in their iBooks shelf)? Thanks in advance Rodrigo Rodrigo Calloni System Librarian Felipe Herrera Library Knowledge and Learning Sector Tel: 202-623-2952 Fax: 202-623-3183 image001.gif 1300 New York Avenue, N.W. Washington, D.C. 20577 USA www.iadb.orghttp://www.iadb.org/ Knowledge. Empowering People. Transforming Lives. P Please consider the environment before printing this email -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Using Curation Tools
Hi Matt: A very cursory glance at the code suggests that it really is happy only with 'http' not 'https' URLs. Can you restrict your tests to the former and confirm? Thanks, Richard R (BTW, it would be simple to extend the behavior to SSL use cases) On Sep 13, 2013, at 9:01 AM, Matthew Sherman wrote: Hello, Earlier in the week I had been asking about the link checker, as we ran it but were getting the following result: The task was completed successfully. STATUS: Fail, RESULT: Item: 123456789/473 - https://repository.bridgeport.edu/xmlui/handle/123456789/473 = 0 - FAILED Now we are wondering if we have it setup properly or are simply using it incorrectly. The documentation on how to use the curation tools is less clear on how to make everything work than we would like. Is there a more in depth resource that explains how to configure and use the curation tools? Or can someone provide a primer to make sure we have done everything correctly? We are currently using DSpace 1.8, but looking to upgrade at the end of the semester when classes finish. Any help is appreciated. Matt Sherman Digital Content Librarian University of Bridgeport -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Define embargo settings in import process (SAF, AIP, or CSV) - DSpace 3.2
Hi Jacob: Not sure I completely understand your needs, but if I do, it is fairly simple using 1.6 embargo to do what you want. Steps: (1) Decide on a metadata field to store embargo dates: you will have to add it to the metadata registry in the admin UI. I'll call it 'dc.embargo' in this explanation, but note that that is an abuse of the Dublin Core namespace to some extent. (2) Configure embargo in dspace.cfg: embargo.field.terms = dc.embargo embargo.field.lift = dc.embargo (3) then all you have to do is enter the embargo dates (in ISO date format: -MM-DD) in the 'dc.embargo' metadata field of items you are importing (in SAF in ItemImport, etc). When the Import is run, all your content bearing an embargo date will be placed under embargo. If you also want to enter embargo dates in the submit Web UI, just add the 'dc.embargo' field to input_forms.xml. If all your content, is batch, don't bother. (4) SInce it's ordinary metadata, you should be able to export it in all the forms you mention - AIP, SAF, CSV, etc just fine. (5) When you feel like it, run the Embargo Lifter script to remove the embargo restrictions of eligible items. If this is a pain because there are a lot of dates, just add the script to a 'cron' job that will run automatically on a schedule that makes sense for you (nightly, weekly, etc). Hope this helps, Richard Rodgers On Sep 10, 2013, at 5:07 PM, Brown, Jacob wrote: This is my first email to this list, so I apologize if this has been covered previously or if I’m using the list incorrectly. Is it possible to define embargo settings (e.g., a “lift date”) as part of a batch import process (e.g., using the Simple Archive Format ingest process or the AIP or CSV ingest processes)? I searched for this issue online, and came across this thread: http://dspace.2283337.n4.nabble.com/KE1019161-Embargo-settings-on-item-import-td4660719.html. Helix84’s post at Jan 03, 2013; 7:38am touches directly on this issue, but seems incorrect as far as I can tell (AIP mets.xml doesn’t seem to describe embargo settings). I’ve created an item in my test repository (DSpace 3.2) using the xmlui interface and defined an embargo for a bitstream using the “Simple Embargo/ UploadWithEmbargoStep” process. I’ve confirmed that the item was added successfully, and that the policy restrictions are enforced for anonymous users. I then exported the item as an AIP, SAF, and CSV (using the various export mechanisms). None of the exported data (AIP’s mets.xml, the SAF files, or the CSV) seemed to have any information about my embargo. Is there a way to include this information in the import process, or would I have to set this manually in xmlui or write a SQL script to add these embargo policies? Thanks, Jacob Brown -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] RSS for specific search
This is already possible with OpenSearch - simply define and expose an O.S. URL with the desired query. Dereferencing the URL will return search results in RSS or Atom format. Hope this helps, Richard R On Aug 14, 2013, at 1:43 PM, helix84 heli...@centrum.sk wrote: On Wed, Aug 14, 2013 at 4:39 PM, Calloni, Rodrigo rcall...@iadb.org wrote: I wonder if there are plans to create add a RSS feed for the results from searches? For example, I can search my repository for “global warming” and then subscribe to the RSS feed. Then every time there is a new item with “global warming” on it, it will be sent to my feed. Hi Rodrigo, I agree it would be a nice feature. In any case, please make sure to file your feature request in Jira (the more complete the feature specification will be, the better). While it's fine to ask here, we don't keep track of requests here in the mailing lists. Thanks. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette smime.p7s Description: S/MIME cryptographic signature -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] creative commons not creating license_text or license_url file
Hi Jose: The cc stuff was rewritten to use a more recent and better supported web service API from Creative Commons. As part of this, the cc license URL file as a bitstream was dropped in favor of storing this value in a (configurable) metadata field. There are a few other changes (see the documentation), but the behavior you see isn't a bug…. Thanks, Richard On Jun 10, 2013, at 3:12 PM, Jose Blanco wrote: I believe this used to work, but using xmlui and selecting a cc license during submission does not seem to create the text and url file - the rdf file gets created. This is using xmlui. Does anything come to mind? -Jose -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Creative commons in edit item tool
Hi folks: Just to add to this thread, I think the best approach is to directly edit these related CC fields (or add/remove bitstreams) as seldom as possible, since it is too easy to introduce inconsistencies among them (e.g. the name doesn't match the URI, or the bitstream, etc). To address this need in a 'safer' way, I just put up some curation task code: https://github.com/richardrodgers/ctask/blob/master/general/src/main/java/org/dspace/ctask/general/CCLicenseLookup.java that does the following: it uses the license URI as the 'key' and then fetches the values (license Name, and license RDF) from CC and assigns them to the item. That way one can be assured of having a consistent set of values (2 metadata fields and a bitstream). In fact, I think the submission UI can be configured to only set the URI, so that you don't need to redo anything but that field, and leave the rest to the task. It's untested as yet, but since we might be needing something like this for local MIT 1.8 upgrade work, I thought I'd share early… Thanks, Richard R. On Jan 21, 2013, at 3:20 PM, Andrea Schweer wrote: Hi, On 22/01/13 02:11, helix84 wrote: Anyway, there's no direct support for adding a CC license in XMLUI or JSPUI that we know of AFTER the original submission. I also tried looking at how you can do it manually by editing bitstreams/metadata, but the fact that the bitstreams are stored in the CC-LICENSE bundle together with the fact that you can't add or remove bundles from the UIs means, that it can't be done at this moment. You can add to the CC-LICENSE bundle via edit bitstreams in XMLUI if you add the bundle name to xmlui.bundle.upload in dspace.cfg -- in fact, just un-commenting the default should do it: https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L1749 cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Google Scholar citation_pdf_url for multifile items
Hi Andrea: The case you cite is not as obvious to me: how can we assume that the single PDF is the primary artifact (i.e. the one that the rest of the GS tags describe)? We have cases where (in an Item) the article is in Word, or LaTeX, and a supplementary file is a PDF. In those cases the rule you propose would ask GS to index the wrong bitstream. Because of cases like these, we deliberately enshrined the most conservative rule possible (if there is only one bitstream *and* it's a PDF) - since scholar asked us to value accuracy over completeness. But it is absolutely right that the rule can be too restrictive in many ways. We kicked around (but didn't have time to implement for the 1st release) the notion of a site-specific, user-configurable 'map' function or functions, that would yield 0 or 1 bitstreams per item. The idea is that if there *is* a consistent 'pattern' (like the one you mention), the page could dynamically determine the value of the citation_pdf_url by calling the function. Design questions include: * should there be a site-wide mapping rule, or one per collection (per format type, etc)? * probably should be be a default (maybe just the current hard-coded one) - so that we don't force additional configuration * how should the rule be expressed? * how to limit runtime penalties etc. I can probably dig up some notes on this if there is interest in that approach. My 2 cents, Richard On Jan 13, 2013, at 11:38 PM, Andrea Schweer wrote: Hi all, I just discovered that DSpace (XMLUI, 1.8.2 but 3.0 has the same behaviour) generates the citation_pdf_url header for Google Scholar on an item page if and only if *the item has exactly one bitstream in the ORIGINAL bundle (or the first such bundle, to be precise); and *this bitstream is of type application/pdf Code in master here: https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/app/util/GoogleMetadata.java#L1007 I found old discussion around this in Jira here: https://jira.duraspace.org/browse/DS-396?focusedCommentId=17461page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17461 that explains what I assume is still the reasoning behind the current explanation: How does one choose, for instance, a.) which PDF in an item with multiple PDF bitstreams, b.) what is specified for a URL when there is no PDF for an item, c.) whether or not to specify a PDF if the only PDF available is not the main representative bitstream of the item. Google Scholar has said they are not interested in having citation tags for an item if this field is not provided for. I find this a bit counter-intuitive especially in the case of items with one PDF file plus one more more files in a different format -- surely there it should be fine to use the single PDF file in the citation_pdf_url? Are there any other opinions around this? cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- Master Visual Studio, SharePoint, SQL, ASP.NEThttp://ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] vscan and upload max in cocoon
Hi Jose: With respect to question #1, there are actually 2 different ways to achieve checking in submission: (1) You can configure virus scan to be invoked directly in submission - see config file (I think this is XMLUI only): https://github.com/DSpace/DSpace/blob/master/dspace/config/modules/submission-curation.cfg (2) You can attach a virus scan to any step in workflow: https://wiki.duraspace.org/display/DSDOC18/Curation+System#CurationSystem-Inworkflow so that the bitstreams get checked before entering the repository. Then, as you observe, you are also free to check via cmd-line or in the admin UI Thanks, Richard R On Jan 11, 2013, at 11:34 AM, Jose Blanco wrote: I have two questions. 1. If virus checking is enabled as described here: https://wiki.duraspace.org/display/DSPACE/Virus+Scan+Curation+Task virus checking can be done via the UI as admin, and command line, but when an item is submitted it is NOT virus checked, or is it? 2. The max file upload size available via cocoon remains at 2G, right? So if a user has a file larger than that we have to use item importer to get it into DSpace? Many thanks! Jose -- Master HTML5, CSS3, ASP.NEThttp://ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] vscan and upload max in cocoon
Yes - the shipping default is no checking, so you need to enable it. Richard On Jan 11, 2013, at 12:07 PM, Jose Blanco wrote: so to have it checked during submission, accourding to this https://github.com/DSpace/DSpace/blob/master/dspace/config/modules/submission-curation.cfg set it like this: virus-scan = false seems like it should be true? Thank you! On Fri, Jan 11, 2013 at 11:49 AM, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Hi Jose: With respect to question #1, there are actually 2 different ways to achieve checking in submission: (1) You can configure virus scan to be invoked directly in submission - see config file (I think this is XMLUI only): https://github.com/DSpace/DSpace/blob/master/dspace/config/modules/submission-curation.cfg (2) You can attach a virus scan to any step in workflow: https://wiki.duraspace.org/display/DSDOC18/Curation+System#CurationSystem-Inworkflow so that the bitstreams get checked before entering the repository. Then, as you observe, you are also free to check via cmd-line or in the admin UI Thanks, Richard R On Jan 11, 2013, at 11:34 AM, Jose Blanco wrote: I have two questions. 1. If virus checking is enabled as described here: https://wiki.duraspace.org/display/DSPACE/Virus+Scan+Curation+Task virus checking can be done via the UI as admin, and command line, but when an item is submitted it is NOT virus checked, or is it? 2. The max file upload size available via cocoon remains at 2G, right? So if a user has a file larger than that we have to use item importer to get it into DSpace? Many thanks! Jose -- Master HTML5, CSS3, ASP.NEThttp://ASP.NET/, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Create collection from command line
Hi Mark: I'd second helix's suggestion - I think those are quite reasonable and generally useful features. In fact, I've already implemented them (except the export) in a rewrite of 'ItemImport'. The problem is that the code (https://github.com/richardrodgers/mds/blob/master/admin/src/main/java/org/dspace/app/admin/ContentImport.java) works only for an experimental branch of DSpace, not what you are running. If someone is interested though, a 'back-port' would not be that difficult. Richard On Jan 8, 2013, at 6:53 AM, helix84 wrote: On Sat, Jan 5, 2013 at 8:04 PM, Mark Ehle marke...@gmail.commailto:marke...@gmail.com wrote: Thanks, and I am in no way complaining about it. Maybe no one but me would ever need these features. Hi Mark, can you file a Jira issue describing the proposed functionality? Maybe someone will implement it at some point. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Create collection from command line
Mark: Yes, I think structure-builder is guilty as charged on all counts, just wanted to make sure you were aware of it. Richard On Jan 5, 2013, at 7:28 AM, Mark Ehle wrote: Oops - make that 3. It would be handy to export an existing structure. On Sat, Jan 5, 2013 at 7:18 AM, Mark Ehle marke...@gmail.commailto:marke...@gmail.com wrote: Richard - yup, I am using structure-builder. I might be wrong, but I think it has two shortcomings: 1) It won't allow you to build a structure under an existing collection. I am having to build a top-level community, and then use community-filiator to move it under where I want. and 2) It won't let you build a collection by itself. It would be handy (for me anyway) to make an structure xml file of just a collection and then use a command to slot it under an existing community. Like I posted before, I am able to use an empty METS package file (a zip with just mets.xml in it) and by using the packager command, I can create an empty collection under where I want it to go. It's a few extra steps, but it works. However, I have not found a way to script the creation of a sub-community without having to move it after it's creation. Thanks, Mark On Fri, Jan 4, 2013 at 11:12 PM, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Hi Mark: Have you looked at StructBuilder, command-line tool used to create Communities Collections from XML files: https://wiki.duraspace.org/display/DSDOC3x/Importing+Community+and+Collection+Hierarchy Not sure if this fits your needs, but might be worth examining, Hope this helps, Richard R On Jan 4, 2013, at 8:00 PM, Mark Ehle wrote: Folks - I need a way to create an empty collection from the command line. We are automating the ingestion of newspaper PDF's and the structure-build command will allow me to do everything I need except create a collection by itself in a separate step. The community that the collection belongs to will already be established. I'm using Dspace 3.0 on Ubuntu 12.10 server. Any ideas? Thanks again! Mark Ehle Computer Support Librarian Willard Library Battle Creek, MI -- Master Visual Studio, SharePoint, SQL, ASP.NEThttp://ASP.NET/, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122912___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122912___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Create collection from command line
Hi Mark: Have you looked at StructBuilder, command-line tool used to create Communities Collections from XML files: https://wiki.duraspace.org/display/DSDOC3x/Importing+Community+and+Collection+Hierarchy Not sure if this fits your needs, but might be worth examining, Hope this helps, Richard R On Jan 4, 2013, at 8:00 PM, Mark Ehle wrote: Folks - I need a way to create an empty collection from the command line. We are automating the ingestion of newspaper PDF's and the structure-build command will allow me to do everything I need except create a collection by itself in a separate step. The community that the collection belongs to will already be established. I'm using Dspace 3.0 on Ubuntu 12.10 server. Any ideas? Thanks again! Mark Ehle Computer Support Librarian Willard Library Battle Creek, MI -- Master Visual Studio, SharePoint, SQL, ASP.NEThttp://ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122912___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122912___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] (no subject)
Hi Peter: This looks great and we'll have a closer look, since we are likewise in a 1.8 local upgrade process and would like to 'normalize' the CC License representations. A few cursory observations in the meantime: I see the task sets the metadata fields, but does not delete the (now presumably unneeded) CC bitstreams. They are tiny, but I suppose there is a remote possibility that the metadata field and the bitstream could come to differ, and there would be some head-scratching. I like the optimization for running over entire sites. In fact, more recent curation work has generalized your approach so that any task can be run against a DB-query (or index search): see https://github.com/richardrodgers/mds/blob/master/kernel/src/main/java/org/dspace/curate/ObjectSelector.java and https://github.com/richardrodgers/mds/blob/master/kernel/src/main/java/org/dspace/curate/selector/QuerySelector.java Thanks again, Richard R On Dec 27, 2012, at 2:22 PM, Peter Dietz wrote: Hi All, I was wondering if I could get some feedback from people who have been using Creative-Commons licensing before-and-after DSpace 1.8 (when the REST API licensing rewrite was added to DSpace). In 1.7 and below, when viewing an Item, it looks for a bitstream in the CC-LICENSE bundle, and tries to grad the URI from the appropriate bitstream, upon page load. In 1.8 and later, when viewing an Item, the license metadata gets added to a CC-Name and a CC-URI metadata field. The UI just points to those metadata fields to indicate the proper license. This leaves one with a legacy problem, as mentioned in the 1.8 release notes. DS-964https://jira.duraspace.org/browse/DS-964 Rewrite of Creative Commons licensinghttps://wiki.duraspace.org/display/DSDOC18/Configuration#Configuration-ConfiguringCreativeCommonsLicense for XMLUI * Better integrates the Creative Commons licence selection into the submission process * Legacy problem – do we update old license to new or not? Currently MIT runs 'split version' with old licenses looking like old, and new look like new. Provided by MIT. So, we have just started to roll out our upgrade to 1.8 locally, (we're a little late to the party), and we're not happy with being stuck with a legacy problem. So I was wondering if anyone has done anything to their items to address this situation. For us, we've built a Curation-Task to migrate the old Items with CC-License to fill in their CC-Name and CC-URI metadata fields. You can take a look at the code for it if you are interested. https://github.com/osulibraries/DSpace/blob/osukb/dspace-api/src/main/java/org/dspace/ctask/general/CreativeCommonsMetadataMigration.java It essentially gets ran once during the upgrade, goes through all Items in your repository with a CC-LICENSE bitstream, extracts the LICENSE-URI from a bitstream, compares that with a locally maintained table of license options, and fills in CC-NAME and CC-URI metadata fields. (We've chosen dc.rights.cchttp://dc.rights.cc/ and dc.rights.ccuri). I've run into performance issues on running a curation task recursively over the entire site-level, and had to make some optimizations to the distribute method, but it runs pretty well now. Peter Dietz -- Master Visual Studio, SharePoint, SQL, ASP.NEThttp://ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Seemingly random file not found errors for bitstreams
Just FYI: SRB classes are always in the code path (even if not using SRB storage) - just a peculiarity of the implementation. Thanks, Richard On Dec 18, 2012, at 12:46 PM, Chris King wrote: Nope. We shouldn't be using SRB. I just checked dspace.cfg and all the SRB properties are commented out. Thanks for the quick reply, Chris On Tue, Dec 18, 2012 at 11:18 AM, helix84 heli...@centrum.skmailto:heli...@centrum.sk wrote: Hi Chris, are you using SRB [1]? According to the edu.sdsc.grid.iohttp://edu.sdsc.grid.io/ classes appearing in the stacktrace it seems to me you do, but you didn't mention it, so I'm asking if that's intentional. Do you have any srb.* properties uncommented in dspace.cfg? [1] https://wiki.duraspace.org/display/DSDOC3x/Storage+Layer#StorageLayer-ConfiguringSRBStorage Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Embargo does not actually embargo
Hi Tim: Is the bitstream retrievable? If not, I think there may be just a quite reasonable misunderstanding: 1.7 Embargo does not hide the metadata (or Item page) from view , it only restricts access to the bitstreams. This is the intended design. The 3.0 behavior is different, where (I believe) you can also hide the metadata. If, however, you can access the bitstreams, there is indeed something odd going on. Hope this helps, Richard R On Dec 7, 2012, at 3:31 PM, Tim Au Yeung wrote: Hi all, I have a bit of an odd scenario going on here and could use some suggestions. We have a 1.7.2 instance running using the default embargo settings. It sets the embargo lift date and removes the policies for the item but the embargoed bitstream is viewable despite the fact that there are no read policies on the bitstream in the item's authorizations. Anyone encounter a similar situation/have suggestions as to what's going on here? Thanks, Tim - Tim Au Yeung Manager, Repository Technology Libraries and Cultural Resources University of Calgary ytau(at)ucalgary.cahttp://ucalgary.ca 403.220.8975 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] Item batch import - replace problem
Hi Ying: There is a companion to ItemImport called ItemUpdate that seems to match your use-case better. See: https://wiki.duraspace.org/display/DSDOC3x/Updating+Items+via+Simple+Archive+Format If memory serves, it has been available since 1.6 or so Hope this helps, Richard R On Oct 24, 2012, at 1:57 PM, Ying Jin wrote: Hi All, We use DSpace batch import a lot and sometimes have to replace the items to fix the metadata or content issues. However, we found the replacement in itemimport is actually deleting the old item, adding a new one and then having the old handles back to the item. It generates several problems. First, it will delete all relationships, for example, if the item is mapped to another collection, the mapping will be gone after replacement. Also, when deleting the item, the item's internal ID changed and it losts its connection with statistical history. Some of our statistics will show replaced items internal ID number since the ID changed and it can't match the item title. The DSpace 3.0 is releasing next month and we would like to know if it is possible to have the problem fixed in the new release. Thanks, Ying @ Rice University -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Adding RSS feed to DSpace Search Results
Hi Rodrigo: No - OpenSearch has been in DSpace since 1.6 for all UIs (JSP and XMLUI). There is a discovery related issue in XMLUI 1.7+, but: (1)If you don't run discovery (only available in XMLUI), there should be no problem on any version 1.6 or later, (2) There is a fix posted for 1.8+ (https://jira.duraspace.org/browse/DS-1244), so if you want to upgrade to 1.8 *and* use discovery, applying the patch should address the issue. So you don't need to wait for 3.0 to enable OpenSearch, although it will of course also work. Thanks, Richard R On Sep 10, 2012, at 12:44 PM, Calloni, Rodrigo wrote: Thanks Andrea So I assume that OpenSearch is only available after 1.7 and for JSPUI. Is that correct? We are planning to upgrade to the most recent DSpace version soon so we can see this as a possibility. When is 3.0 release date? Best regards Rodrigo Rodrigo Calloni System Librarian Felipe Herrera Library Knowledge and Learning Sector Tel: 202-623-2952 Fax: 202-623-3183 1300 New York Avenue, N.W. Washington, D.C. 20577 USA www.iadb.orghttp://www.iadb.org Knowledge for Development Challenges Please consider the environment before printing this email -Original Message- From: Andrea Bollini [mailto:boll...@cilea.it] Sent: Monday, September 10, 2012 11:29 AM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Adding RSS feed to DSpace Search Results The same result can be obtained using OpenSearch, see an example here: http://eprints.rclis.org/simple-search?query=submit=Go In jspui it works well with both legacy lucene search engine ( dspace 3.0) and the new discovery search engine (from the next release, dspace 3.0). In XMLUI as far as I know there was an issue (that should be solved in the next 3.0) so open search doesn't work with discovery. Andrea Il 10/09/2012 16:27, helix84 ha scritto: On Mon, Sep 10, 2012 at 4:14 PM, Calloni, Rodrigo rcall...@iadb.org wrote: Is there some local development that you could share with us to achieve this functionality? Hi Rodrigo, I can confirm that DSpace currently doesn't know how to do that. It can do subscriptions only for collections. I don't know of anyone who has done this. In case you decide to do this kind of development, I recommend against doing it on the SearchArtifacts, which is default in 1.6. Instead, develop it for Discovery (present since 1.7, improved in 1.8 and 3.0), which will be the default XMLUI aspect in 3.0. It should also be easier to do in Discovery, not to mention performance. Remember, that SearchArtifacts will probably be obsoleted in future DSpace versions. Regards, ~~helix84 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Dott. Andrea Bollini boll...@cilea.it ph. +39 06 59292853 - mob. +39 348 8277525 - fax +39 06 5913770 CILEA - Consorzio Interuniversitario http://www.cilea.it/disclaimer -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net
Re: [Dspace-tech] Adding RSS feed to DSpace Search Results
Hi Rodrigo: I'd start here: http://www.dspace.org/1_6_2Documentation/ch13.html#N182C9 http://www.dspace.org/1_6_2Documentation/ch05.html#N140AC and then post any further questions to the tech-list. Hope this helps, Richard On Sep 10, 2012, at 1:02 PM, Calloni, Rodrigo wrote: Thanks Richard I wonder, how can I activate OpenSearch in my current 1.6.2? Is there any documentation available? Rodrigo Rodrigo Calloni System Librarian Felipe Herrera Library Knowledge and Learning Sector Tel: 202-623-2952 Fax: 202-623-3183 image001.gif 1300 New York Avenue, N.W. Washington, D.C. 20577 USA www.iadb.orghttp://www.iadb.org/ Knowledge for Development Challenges P Please consider the environment before printing this email From: Richard Rodgers [mailto:rrodg...@mit.edu] Sent: Monday, September 10, 2012 12:59 PM To: Calloni, Rodrigo Cc: Andrea Bollini; dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Adding RSS feed to DSpace Search Results Hi Rodrigo: No - OpenSearch has been in DSpace since 1.6 for all UIs (JSP and XMLUI). There is a discovery related issue in XMLUI 1.7+, but: (1)If you don't run discovery (only available in XMLUI), there should be no problem on any version 1.6 or later, (2) There is a fix posted for 1.8+ (https://jira.duraspace.org/browse/DS-1244), so if you want to upgrade to 1.8 *and* use discovery, applying the patch should address the issue. So you don't need to wait for 3.0 to enable OpenSearch, although it will of course also work. Thanks, Richard R On Sep 10, 2012, at 12:44 PM, Calloni, Rodrigo wrote: Thanks Andrea So I assume that OpenSearch is only available after 1.7 and for JSPUI. Is that correct? We are planning to upgrade to the most recent DSpace version soon so we can see this as a possibility. When is 3.0 release date? Best regards Rodrigo Rodrigo Calloni System Librarian Felipe Herrera Library Knowledge and Learning Sector Tel: 202-623-2952 Fax: 202-623-3183 1300 New York Avenue, N.W. Washington, D.C. 20577 USA www.iadb.orghttp://www.iadb.org Knowledge for Development Challenges * Please consider the environment before printing this email -Original Message- From: Andrea Bollini [mailto:boll...@cilea.it]mailto:[mailto:boll...@cilea.it] Sent: Monday, September 10, 2012 11:29 AM To: dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Adding RSS feed to DSpace Search Results The same result can be obtained using OpenSearch, see an example here: http://eprints.rclis.org/simple-search?query=submit=Go In jspui it works well with both legacy lucene search engine ( dspace 3.0) and the new discovery search engine (from the next release, dspace 3.0). In XMLUI as far as I know there was an issue (that should be solved in the next 3.0) so open search doesn't work with discovery. Andrea Il 10/09/2012 16:27, helix84 ha scritto: On Mon, Sep 10, 2012 at 4:14 PM, Calloni, Rodrigo rcall...@iadb.orgmailto:rcall...@iadb.org wrote: Is there some local development that you could share with us to achieve this functionality? Hi Rodrigo, I can confirm that DSpace currently doesn't know how to do that. It can do subscriptions only for collections. I don't know of anyone who has done this. In case you decide to do this kind of development, I recommend against doing it on the SearchArtifacts, which is default in 1.6. Instead, develop it for Discovery (present since 1.7, improved in 1.8 and 3.0), which will be the default XMLUI aspect in 3.0. It should also be easier to do in Discovery, not to mention performance. Remember, that SearchArtifacts will probably be obsoleted in future DSpace versions. Regards, ~~helix84 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Dott. Andrea Bollini boll...@cilea.itmailto:boll...@cilea.it ph. +39 06 59292853 - mob. +39 348 8277525 - fax +39 06 5913770 CILEA - Consorzio Interuniversitario http://www.cilea.it/disclaimer -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech
Re: [Dspace-tech] Ingesting large data set
Yes, as has been remarked, the bigger questions revolve around access and usage, rather than ingest. We recently did a pilot with large video files where we ingested them as preservation masters (via ItemImport), suppressed the download link, but offered in it's place a link to a much smaller transcoded access copy on YouTube. The thinking was that formats change, we could reuse the master, thereby guaranteeing access in a mediated way… On Aug 30, 2012, at 12:42 PM, Mark H. Wood wrote: We are just setting up a data repository and will probably soon be facing similar challenges. This also has some relationship to longer videos and the like. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Asking whether markets are efficient is like asking whether people are smart. -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Thumbnail Priview. in 1.6.2
Hi Kirti: Not sure if you are having other problems, but I did want to clarify how MediaFilter works. It is a general set of tools for operating on your bitstream content, and the primary use for most people is to extract text (for indexing) from PDFs Word files etc, not to produce thumbnails of those formats. These functions are configured in dspace.cfg (search for 'mediafilter' properties) - and each 'filter' is given a list of formats to process. It further optimizes its work by not recreating a derivative (i.e. the text file, or thumbnail, etc) if it already exists - that is the message you are seeing below (SKIPPED). Hope this helps, Richard R On Jun 22, 2012, at 6:34 AM, Kirti Bodhmage wrote: Hi, We have Dspace 1.6.2. I am trying to enable thumbnail creation. Ran ./filter-media in dspace/bin directory Got following errors while executing script. After the execution I could see thumbnail image for png item but couldn't see anything for the pdf and other text items. I was expecting filter-media will create image file from pdf and Word documents but its creating .txt file instead. Saw previous emails on thumbnail creation in this mailing list. Is xpdf filter is better choice for pdfs and docs ? --- SKIPPED: bitstream 401 (item: 123456789/131) because 'thesisJPWoodcock1997-1.pdf.txt' already exists SKIPPED: bitstream 415 (item: 123456789/300) because 'SHAHTransnationalHindu2009FINAL.pdf.txt' already exists SKIPPED: bitstream 439 (item: 123456789/135) because 'CARBONI_D_FINAL.pdf.txt' already exists ERROR filtering, skipping bitstream: Item Handle: 123456789/136 Bundle Name: ORIGINAL File Size: 110763816 Checksum: 044ce0fc33dbaf9299248cd17cf24828 (MD5) Asset Store: 0 java.io.FileNotFoundException: /opt/dspace/assetstore-ad1/16/32/42/163242047713616019554023286382622639427 (No such file or directory) SKIPPED: bitstream 445 (item: 123456789/137) because 'RAMSDEN_PhD_FINAL.pdf.txt' already exists SKIPPED: bitstream 447 (item: 123456789/138) because 'WU_T_FINAL.pdf.txt' already exists SKIPPED: bitstream 578 (item: 123456789/1113) because 'ADETORONumericalAndExperimental2009FINAL.pdf.txt' already exists ERROR filtering, skipping bitstream: Item Handle: 123456789/163 Bundle Name: ORIGINAL Bundle Name: ORIGINAL Bundle Name: ORIGINAL File Size: 270417 Checksum: 8360d7bd72fe23ead9de220a78b047e3 (MD5) Asset Store: 0 java.io.FileNotFoundException: /opt/dspace/assetstore-ad1/10/21/49/102149325471428675497688294414612838518 (No such file or directory) ERROR filtering, skipping bitstream: - Here are my settings in dpspace.cfg --- webui.itemlist.columns = thumbnail, dc.date.issued(date), dc.title, dc.contributor.* webui.itemlist.widths = *, 130, 60%, 40% webui.itemlist.dateaccessioned.columns = thumbnail, dc.date.accessioned(date), dc.title, dc.contributor.* publications.bundles.allowed = ORIGINAL, DELETED, LICENSE, THUMBNAILS webui.browse.thumbnail.show = true webui.browse.thumbnail.maxheight = 80 webui.browse.thumbnail.maxwidth = 80 webui.item.thumbnail.show = true webui.browse.thumbnail.linkbehaviour = item # maximum width and height of generated thumbnails thumbnail.maxwidth 80 thumbnail.maxheight 80 --- Thanks Kirti -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sending an embargoed item via sword
Hi Mark: We may, but we should first determine whether the METsPackager is functioning correctly now. Tim D did the METSPackager work in support of the AIP backup/restore, and I'm pretty sure restoring embargo state was a use-case. As least, it *should* be. Richard On May 3, 2012, at 3:55 PM, Mark Diggory wrote: I'm determining if we would want to add some of these details to our Advanced Embargo Support business/technical requirements for DSpace 3.0. https://wiki.duraspace.org/display/DSPACE/Advanced+Embargo+Support Best, Mark On Tue, Apr 3, 2012 at 2:52 PM, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Ignasi: A little more detail on Tim's point: the code paths are a little different when submissions go through workflow: Case 1 (which works) - SWORD deposit goes into workflow. When it exits workflow, 'InstallItem.installItem()' gets called, which in turn calls EmbargoManager to set embargo if necessary. Case 2 (fails) - SWORD goes straight in (no workflow) - this uses some code in the METsPackager to create and install the item, not InstallItem (I think, have to check the code), so it misses the embargo checking. A lot of work has been done on packager code since 1.6, and I'm guessing it would work OK on later releases, at least 1.8 So it does appear that it's a 1.6 (packager) bug. Your options are to patch 1.6 (possible, but will take some analysis and testing), or live with putting SWORD deposits through workflow until you upgrade... Richard R. On Apr 3, 2012, at 5:01 PM, Tim Donohue wrote: My apologies, Ignasi, It was pointed out to me that this *should* be possible, as per our DSpace 1.8 Documentation: https://wiki.duraspace.org/display/DSDOC18/Embargo#Embargo-Termsassignment It explicitly states: The first step in placing an embargo on an item is to attach (assign) 'terms' to it. If these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. My previous message was incorrect. I forgot that DSpace Embargo can be set by merely passing in a DC metadata field. So, it *should* be possible to set an embargo via the DSpace SWORD field (assuming you just pass in the embargo terms as the configured metadata field). I now wonder if this is a bug in DSpace 1.6.x. I'm not sure if anyone has tried this in more recent versions of DSpace? - Tim On 4/3/2012 3:10 PM, Tim Donohue wrote: Hi Ignasi, Unfortunately, I don't believe this is possible, as SWORD doesn't support the idea of an item-level embargo. The Item Embargo feature you are talking about is specific to DSpace. The only way I can think of doing this is to submit the item via SWORD into a collection that has a workflow approval process setup (with the edit metadata step). Then you'd have to have someone manually go into the edit metadata step and manually add in the embargo. I've never tried this myself, but I think it may work. - Tim On 3/29/2012 4:03 PM, Ignasi Labastida i Juan wrote: Hi, We are implementing the embargo feature in our institutional repository. It seems to work fine. We are also connecting our Research Information System with the IR using the sword protocol. Now we have found a problem when we send a document with embargo to a collection without workflow. The item sent via sword goes to the collection with all the metadata but without embargo. The embargo terms and metadata arrive fine. The same item is sent to a collection with workflow via sword, using the same METS file, and once is accepted by the administrator the embargo goes on. Any hint? Is it possible to send the embargoed item to a collection without workflow? We have DSpace 1.6.2 and Sword 1.3 Thanks Ignasi Labastida Universitat de Barcelona -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace index Zip files. Searchable?
Hi Magnus: Another direction you may want to look into is the mediafilter curation work: https://github.com/richardrodgers/ctask This approach uses the Apache Tika text extraction library, which I am fairly certain supports reading zip files. The extracted text can then be indexed by normal DSpace processes. If you have a 1.8 DSpace, you may wish to evaluate it. Thanks, Richard R On Apr 10, 2012, at 11:33 AM, Tim Donohue wrote: Magnus, Currently, DSpace does not support indexing within Zip Files. So, they are not searchable by default. However, it may be possible to build a custom indexer (DSpace calls them MediaFilters) that can extract the contents of a Zip file. This would take some Java programming, but if you decided to build one it may be something we could include in a future version of DSpace. More information on MediaFilters and how to create custom MediaFilters can be found in the DSpace Documentation at: https://wiki.duraspace.org/display/DSDOC18/Transforming+DSpace+Content+%28MediaFilters%29 If you decide to try and build a MediaFilter that can extract index Zip files, you are more than welcome to discuss your ideas on this listserv and submit your final code to our Ticket System (https://jira.duraspace.org/browse/DS) - Tim On 4/8/2012 5:38 AM, mango_pa...@hotmail.com wrote: Does anyone know if DSpace can index and/or make zip files searchable after ingesting them into the repository? Greetings from Magnus -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sending an embargoed item via sword
Ignasi: A little more detail on Tim's point: the code paths are a little different when submissions go through workflow: Case 1 (which works) - SWORD deposit goes into workflow. When it exits workflow, 'InstallItem.installItem()' gets called, which in turn calls EmbargoManager to set embargo if necessary. Case 2 (fails) - SWORD goes straight in (no workflow) - this uses some code in the METsPackager to create and install the item, not InstallItem (I think, have to check the code), so it misses the embargo checking. A lot of work has been done on packager code since 1.6, and I'm guessing it would work OK on later releases, at least 1.8 So it does appear that it's a 1.6 (packager) bug. Your options are to patch 1.6 (possible, but will take some analysis and testing), or live with putting SWORD deposits through workflow until you upgrade... Richard R. On Apr 3, 2012, at 5:01 PM, Tim Donohue wrote: My apologies, Ignasi, It was pointed out to me that this *should* be possible, as per our DSpace 1.8 Documentation: https://wiki.duraspace.org/display/DSDOC18/Embargo#Embargo-Termsassignment It explicitly states: The first step in placing an embargo on an item is to attach (assign) 'terms' to it. If these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. My previous message was incorrect. I forgot that DSpace Embargo can be set by merely passing in a DC metadata field. So, it *should* be possible to set an embargo via the DSpace SWORD field (assuming you just pass in the embargo terms as the configured metadata field). I now wonder if this is a bug in DSpace 1.6.x. I'm not sure if anyone has tried this in more recent versions of DSpace? - Tim On 4/3/2012 3:10 PM, Tim Donohue wrote: Hi Ignasi, Unfortunately, I don't believe this is possible, as SWORD doesn't support the idea of an item-level embargo. The Item Embargo feature you are talking about is specific to DSpace. The only way I can think of doing this is to submit the item via SWORD into a collection that has a workflow approval process setup (with the edit metadata step). Then you'd have to have someone manually go into the edit metadata step and manually add in the embargo. I've never tried this myself, but I think it may work. - Tim On 3/29/2012 4:03 PM, Ignasi Labastida i Juan wrote: Hi, We are implementing the embargo feature in our institutional repository. It seems to work fine. We are also connecting our Research Information System with the IR using the sword protocol. Now we have found a problem when we send a document with embargo to a collection without workflow. The item sent via sword goes to the collection with all the metadata but without embargo. The embargo terms and metadata arrive fine. The same item is sent to a collection with workflow via sword, using the same METS file, and once is accepted by the administrator the embargo goes on. Any hint? Is it possible to send the embargoed item to a collection without workflow? We have DSpace 1.6.2 and Sword 1.3 Thanks Ignasi Labastida Universitat de Barcelona -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Localization inside config files?
I think Mark makes a number of good points here - esp. regarding modularity - and it's worth emphasizing that the net effect should be *less* localization effort, even if there are potentially more files, since one would only need to worry about the locally deployed modules - but I'm a bit puzzled about the 'single catalog scheme' as a desired future state. Without much thought, I can come up with 4-5 quite distinct sites (places, files, ways) where localization occurs in DSpace: * in email templates (config/email) * in dspace,cfg and many other config files (starting with the 'dspace.name' property) * in input_forms.xml * messages.xml and that ilk and I'm sure there are others; the curation stuff does not introduce a new locus of localization: localizability permeates the application already. It's also worth noting that localized strings occur not just in the UI proper - they can appear in RSS feeds, OAI-PMH harvests, etc So I'd be leery of a plan to shoehorn all localization into any single 'catalog scheme' , esp. one that is explicitly tied to a UI presentation layer. Having said all this, I sympathize with Christian's plight, and affirm with Mark that we can do a better job of managing it. Richard R. On Apr 2, 2012, at 9:21 AM, Mark H. Wood wrote: On Sat, Mar 31, 2012 at 02:05:34PM +0200, Christian Völker wrote: [snip] Now I just found a new flavour of localization in the dspace/config/modules/curate.cfg file: #ui.tasknames = \ # profileformats = Profile Bitstream Formats, \ # requiredmetadata = Check for Required Metadata, \ # checklinks = Check Links in Metadata ui.tasknames = \ profileformats = Dateityp angehängter Dateien untersuchen, \ requiredmetadata = Pflichtfelder auf Inhalt überprüfen, \ checklinks = Links in Metadaten überprüfen # general = General Purpose Tasks, general = Allgemeine Aufgaben, #ui.statusmessages = \ #-3 = Unknown Task, \ #-2 = No Status Set, \ #-1 = Error, \ # 0 = Success, \ # 1 = Fail, \ # 2 = Skip, \ # other = Invalid Status ui.statusmessages = \ -3 = Unbekannte Aufgabe, \ -2 = Kein Zustand definiert, \ -1 = Fehlerhaft, \ 0 = Erfolgreich, \ 1 = Fehlgeschlagen, \ 2 = Übersprungen, \ other = Ungültiger Zustand Honestly, is this the way to go? Clearly not. We already have two different message catalog schemes, which IMHO is one too many. Configurable message texts should at least be confined to those two. It would be good to get every component to use a single scheme. Bedides the monsterous messages.xml file in modules/xmlui/src/main/webapp/i18n/ with more than 2.100 meanwhile, we already have numerous other places now, where to keep messages.xml files updated, in places such as dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/XMLWorkflow/i18n or dspace-discovery/dspace-discovery-xmlui-api/src/main/resources/aspects/Discovery/i18n Message catalogs will proliferate, because DSpace is becoming modular. Each module needs its own catalog, because it might be released on a different schedule, and because separable components shouldn't depend on each others' catalogs. Indeed it might be good to break up the modules/xmlui/src/main/webapp/i18n/messages.xml into more manageable chunks kept closer to the code that uses them, if there is a good and sensible way to do it. We do need to be careful to maintain consistency across modules, and to document as well as we can where to find localizable texts. [snip] I am sorely missing tool support I know of in other programming environments such as AppleGlot or QTLinguist. QTLinguist supports loading several files to compare and copy from and to each other, enabling something like a visual diff. Then, you can create your own dictionaries and load them. You get suggestions based on translations already finished which helps keeping consitency. The file structure cannot be damaged accidentally. Comments with alternative translations or reminders can be added for each message string. And you get an overview of the progress made by checkmarks in the sidebar for translations entered and translations reviewed. The only thing worse in this tool as compared to our files that it is a bit more complicated to find the place where the translation appears on the finished site, depending on the way programmers structured their work. I agree that good tooling would help. Localization requires a lot of comparison and systematic record-keeping, which are hard for humans but easy for machines. There is a proposal right now over on dspace-devel to use web-based localization tooling and services. I would invite anyone interested in localization to look it over and discuss. See the thread starting at Message-ID: CAGO4j2mtQ8Zp4fXA2WYJLinEi_aJDP17UU_hgDUMnw6=rqg...@mail.gmail.com, 24-Mar-2012, Chandan Kumar, Introduction. I would really like to
Re: [Dspace-tech] moving community
Hi Sisay: There is an administrative tool for this. See the documentation here: https://wiki.duraspace.org/display/DSDOC18/Managing+Community+Hierarchy Hope this helps, Richard On Jan 13, 2012, at 12:22 AM, Webshet, Sisay (ILRI) wrote: Hello, I want to move a dspace sub community to another sub community. What will be the right way of doing this. So far I Have done moving collections from one community to another using sql command. Thanks Sisay. -- RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] (no subject)
Hi Henry: DSpace generally provides access to its assets only by returning the files. However, modern browsers are generally quite good at presenting/rendering PDFs, so if this not happening, you should make sure that the Bitstream format (and associated mime-type) are correct for your files. Hope this helps, Richard On Jan 11, 2012, at 6:12 AM, Henry Atsu Agbodza wrote: I have realised that the files (bitstreams) in my repository can only be downloaded but cannot be viewed within the repository. How can i enable this. Did i miss it during the initial configuration stage? Any help is appreciated. The files are PDF's. -- Webmaster Deputy System Administrator (University of Ghana Library System) -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Issues with First DSpace Repository
Hi Henry: Most emails generated by DSpace are controlled by 'templates' found in the [dspace]/config/emails directory. You can freely change any of the text in these templates - they are essentially simple text files. Be sure to keep a copy of your modifications, so after upgrades, etc you won't lose them. Email templates are mentioned in the manual: https://wiki.duraspace.org/display/DSDOC18/Configuration#Configuration-WordingofEmailMessages Good luck on going live! Thanks, Richard On Jan 10, 2012, at 10:55 AM, Henry Atsu Agbodza wrote: Dear all, I'm in the process of going live with a repository created with Dspace 1.8.1 and i need some help with some few issues. This is my first repository though i have been experimenting with DSpace for sometime. Thanks in advance for any help. 1) when people register, the subject of the email i receive is DSpace: Registration Notification. how can i change this? 2) when people submit items the subject of the email i receive is DSpace: Submission Approved and Archived. The body of the email is Your submission has been accepted and archived in DSpace, and it has been assigned the following identifier: http://hdl.handle.net/123456789/10 Please use this identifier when citing your submission. Many thanks! DSpace How can i modify these settings? 3) When people register, the subject of the email they receive is DSpace Account Registration The body of the email is To complete registration for a DSpace account, please click the link below: http://hostname:8080/xmlui/register?token=d25c2a6b24247f7026208b766db526f9 If you need assistance with your account, please email dspace-h...@myu.edu or call us at xxx-555-. The DSpace Team -- Webmaster Deputy System Administrator (University of Ghana Library System) -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] ArrayIndexOutOfBoundsException in Embargo DayTableEmbargoSetter
Hi Phil: The error comes from not having a value after the 'None:' in the embargo.terms.days.property The code requires that there be a number after the ':' - the 'forever' value is the only exception to that rule (which looks fine in your property). Just a quick thought - why have the 'None' case at all? Since the field assignment is optional, one can simply skip assignment is no embargo is desired. Thanks, Richard On Nov 1, 2011, at 12:41 PM, Shafer, Philip wrote: I keep getting the following error when an item is submitted and attempted to be installed in the repository: java.lang.ArrayIndexOutOfBoundsException: 1 at org.dspace.embargo.DayTableEmbargoSetter.init(DayTableEmbargoSetter.java:48) I'm wondering if someone could see if I have something set wrong. I do need an option for no embargo and an indefinite/forever embargo. Here are the properties I have set: dspace.cfg Embargo Settings # DC metadata field to hold the user-supplied embargo terms embargo.field.terms = dc.embargo.terms # DC metadata field to hold computed lift date of embargo embargo.field.lift = dc.embargo.lift # string in terms field to indicate indefinite embargo embargo.terms.open = forever # embargo.terms.days defines a table of values for the embargo system embargo.terms.days = None:,90 days:90,6 months:180,1 year,365,2 years:730,Forever:forever # implementation of embargo setter plugin - replace with local implementation if applicable # plugin.single.org.dspace.embargo.EmbargoSetter = org.dspace.embargo.DefaultEmbargoSetter plugin.single.org.dspace.embargo.EmbargoSetter = org.dspace.embargo.DayTableEmbargoSetter # implementation of embargo lifter plugin - - replace with local implementation if applicable plugin.single.org.dspace.embargo.EmbargoLifter = org.dspace.embargo.DefaultEmbargoLifter input-forms.xml field dc-schemadc/dc-schema dc-elementembargo/dc-element dc-qualifierterms/dc-qualifier repeatablefalse/repeatable labelEmbargo Item For/label input-type value-pairs-name=embargo_termsdropdown/input-type hintSelect the length for the embargo, or select 'None' for no embargo. Select 'Forever' for an indefinate embargo./hint required/required /field value-pairs value-pairs-name=embargo_terms dc-term=embargo.terms pair displayed-valueNone/displayed-value stored-valueNone/stored-value /pair pair displayed-value90 days/displayed-value stored-value90 days/stored-value /pair pair displayed-value6 months/displayed-value stored-value6 months/stored-value /pair pair displayed-value1 year/displayed-value stored-value1 year/stored-value /pair pair displayed-value2 years/displayed-value stored-value2 years/stored-value /pair pair displayed-valueForever/displayed-value stored-valueForever/stored-value /pair /value-pairs - Philip Shafer Web Developer Rowan University Library Services Ph: 856-256-4418 E-mail: sha...@rowan.edumailto:sha...@rowan.edu ATT1..cATT2..c -- RSAreg; Conference 2012 Save #36;700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Switching to new embargo
Hi Phil: You would not have to rebuild at all, since the DayTable code is already included in the base DSpace. It is a simple reconfiguration: (1) Change the setter property in dspace.cfg: plugin.single.org.dspace.embargo.EmbargoSetter = org.dspace.embargo.DefaultEmbargoSetter Change the value on the right to: org.dspace.embargo.DayTableEmbargoSetter (2) Add a new property in dspace.cfg, which will define the table of values: embargo.terms.days = my first terms:90, my second terms: 180 where you can put any language you want for 'my first terms', which will be interpreted by the embargo code as equal to the number of days after the ':' (30, 60, 365, etc any thing you want). You can define any number of terms. (3) If you are using the web submission for ingesting content (as opposed to SWORD, batch etc), and you want the submitter to be able to choose from a drop-down list of embargo terms, then you will also need to enable that in input_forms.xml. This roughly means defining a new 'value-pairs' element where the terms are the left, and the # of days are the right. See documentation on configurable submission. As to writing your own, you would need to describe what you want to accomplish in a little more detail before I could advise - you may only need a setter, not a lifer, etc Hope this helps, Richard R On Oct 21, 2011, at 2:19 PM, Shafer, Philip wrote: We are running Dspace 1.7.2 and using the DefaultEmbargoSetter and DefaultEmbargoLifter, how looking at the source it seems that the DayTableEmbargoSetter is more what we are looking for. I'm curious if anyone has experience using this and can provide some rough instructions on setting up the DayTableEmbargoSetter? Would I have to rebuild my dspace instance? Can someone explain how I would go about writing custom embargo setter and lifter classes and deploying them to my instance? Thanks, Phil - Philip Shafer Web Developer Rowan University Library Services Ph: 856-256-4418 E-mail: sha...@rowan.edumailto:sha...@rowan.edu ATT1..cATT2..c -- The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] alternative to solr statistics
Hi Jesús: A lot of statistics work has been done for DSpace over time, but each project focuses on different sets of requirements: does the data need to appear in the UI, does it offer real-time availability (just to name two of the strengths of the SOLR-based system)? One example of an alternative is https://wiki.duraspace.org/display/DSPACE/StatisticsAddOn, though I don't know if this has been maintained against versions newer than DSpace 1.6.2 We run an entirely off-line, monthly reporting system using a database designed to accommodate a set of internal administrative requirements - where statistics are delivered as a spreadsheet - , but that might not fulfill your requirements. The tech list archives and the wiki are a good place to start, but you could also post to the list what your use case(s) are, and see if any existing work better meets your needs. Hope this helps, Richard R On Oct 17, 2011, at 6:00 AM, Jesús Martín García wrote: Hi! I've been wondering if there is some kind of alternative to solr statistics, due to the high load of ram to our system (514 millions of records) which it's not easy to scale and it's very very slow. So...Has someone done some work on an alternative? Thanks in advance, Regards, Jesús -- ... __ / / Jesús Martín García C E / S / C A Tècnic de Projectes /__ / Centre de Serveis Científics i Acadèmics de Catalunya Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.catmailto:jmar...@cesca.cat ... -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] batch import of Bagit-formated collections and/or conversion script for Bagit to SAF
BTW, the replicate code does all this - serializes and deserializes bags to DSpace objects Richard On Jul 27, 2011, at 5:56 PM, Stuart Lewis wrote: Hi Hardy, SWORD is completely agnostic about what packages it transports, however out the box, DSpace does not know how to ingest bags via SWORD. You might therefore need to write a bag ingester than knows how to unpack and ingest the contents of the bag. This would make an excellent addition to DSpace :) Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 On 28/07/2011, at 9:29 AM, Pottinger, Hardy J. wrote: Thanks, Mark, that code from MIT looks interesting, I will look into it more. I did notice that the Bagit spec is supported by the SWORD protocol, and when I mentioned this to our archivist, he went and looked and it does appear that the BIL 3.9 can send a bag using SWORD (see output of the BIL -h command, pasted below). So, it looks like Bagger and/or BIL + turning on SWORD for our repository will get us what we want. Huzzah! * BagIt Library (BIL) Version 3.9 Usage: bag operation [operation arguments] [--help] Parameters: operation Valid operations are: baginplace, bob, checkpayloadoxum, create, fillholey, generatepayloadoxum, makecomplete, makeholey, retrieve, splitbagbyfiletype, splitbagbysize, splitbagbysizeandfiletype, sword, update, updatetagmanifests, verifycomplete, verifypayloadmanifests, verifytagmanifests and verifyvalid. Operation explanations: baginplace: Creates a bag-in-place. The source must be a directory on a filesystem and may already have a data directory. bob: Sends a bag using BOB. checkpayloadoxum: Generates Payload-Oxum and checks against Payload-Oxum in bag-info.txt. create: Creates a bag from supplied files/directories, completes the bag, and then writes in a specified format. fillholey: Retrieves any missing pieces of a local bag. generatepayloadoxum: Generates and returns the Payload-Oxum for the bag. makecomplete: Completes a bag and then writes in a specified format. Completing a bag fills in any missing parts. makeholey: Generates a fetch.txt and then writes bag in a specified format. retrieve: Retrieves a bag exposed by a web server. A local holey bag is not required. splitbagbyfiletype: Splits a bag by file types. splitbagbysize: Splits a bag by size. splitbagbysizeandfiletype: Splits a bag by size and file types. sword: Sends a bag using SWORD. update: Updates the manifests and (if it exists) the bag-info.txt for a bag. updatetagmanifests: Updates the tag manifests for a bag. The bag must be unserialized. verifycomplete: Verifies the completeness of a bag. verifypayloadmanifests: Verifies the checksums in all payload manifests. verifytagmanifests: Verifies the checksums in all tag manifests. verifyvalid: Verifies the validity of a bag. [--version] Prints version of BIL and exits. [--help] Prints usage message for the operation. Examples: bag verifyvalid --help Prints help for the verifyvalid operation. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ No matter how far down the wrong road you've gone, turn back. --Turkish proverb On 7/26/11 5:31 PM, Mark Diggory mdigg...@atmire.com wrote: Hardy, Be aware that MIT / Richard Rodgers also has some Bagit work available, currently nested within the modules directory here: http://scm.dspace.org/svn/repo/modules/dspace-replicate/trunk/src/main/jav a/org/dspace/pack/ http://scm.dspace.org/svn/repo/modules/dspace-replicate/trunk/src/main/ja va/org/dspace/pack/Mark On Tue, Jul 26, 2011 at 2:33 PM, Pottinger, Hardy J. pottinge...@umsystem.edu wrote: Hi, I've done a bit of googling on Bagit, and I see that Dryad (and @mire) have done some work with Bagit as a repository interchange mechanism. I am interested in something a bit more mundane. There exists a very nice tool for constructing a bag, called Bagger: http://sourceforge.net/projects/loc-xferutils/files/loc-bagger/ Which would be ideal for adapting for our needs--we need a tool that a scanner technician can use to feed scanned images into our repository. Bags, in my mind, are not much different than SAF packages. It would be trivial to script a converter between
Re: [Dspace-tech] Embargo and OAI interface
Hi Tonny: The embargo system is designed to protect bitstreams, not metadata. While it certainly would be possible to alter OAI or other code to check for embargo dates, this has not been done to the best of my knowledge. I am curious why, given that the content will be inaccessible, is it desirable to hide the metadata from harvesters? Thanks, Richard On May 6, 2011, at 7:18 AM, Tonny Hjelmberg Laursen wrote: I sent the mail below a week ago to the tech-list but I didn't receive any responds on it. I really need some help on this, so if anyone knows a fix….. Thanks, Tonny Fra: tt tt thl@cbs.dkmailto:thl@cbs.dk Dato: Thu, 28 Apr 2011 08:12:45 +0200 Til: dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net Emne: Embargo and OAI interface We are having a few 1.6.x installations with items that have an embargo date. The embargoed items (metadata) are currently available through the OAI interface. Is it possibly to change this, so they can't be harvested until they have passed the liftdate? Thanks, Tonny ATT1..cATT2..c -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] ClamAV curation task on Ubuntu
Hi Robin: Wendy B will follow with details, but yes, IP sockets are built into the design. The main reasons: (1) Portability: desire not to restrict operation of service/deamon to Unix systems. (2) Shareability: with an IP socket - you can have one daemon shared across multiple DSpace clients on different hosts This is a big win when you run mulitple DSpace instances, since you only have to maintain one Clam instance with updates, etc. Thanks, Richard On Apr 11, 2011, at 10:47 AM, Robin Taylor wrote: Hi all, I was just looking to save myself some investigation. I am trying to run the ClamAV curation task on my local Ubuntu machine but its failing with... ERROR org.dspace.curate.ClamScan @ Failed to connect to clamd ClamAV appears to listen on a Unix domain socket /var/run/clamav/clamd.ctl rather than using IP sockets, unfortunately the DSpace config expects a numeric port. I was just wondering if anyone has crossed this hurdle ? I guess I could reconfigure ClamAV to use an IP socket ? Any help appreciated. Cheers, Robin. -- Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace Mets profiles
Hi Robin: No objections, and its long overdue. But a friendly amendment: we have to keep in mind that the Mets profile is not the same as the X-Packaging (package type) in the SWORD protocol. That latter has been a neglected and therefore somewhat problematic area, and the work you propose will definitely help. Thanks, Richard On Mar 4, 2011, at 8:41 AM, Robin Taylor wrote: Hi all, There is currently one Mets profile for DSpace that I know of https://wiki.duraspace.org/display/DSPACE/DSpaceMETSSIPProfile. Unfortunately its been widely abused in the sense that it specifies that the metadata will be in MODS, but DSpace exposes and exports data claiming to adhere to the profile but with the metadata as EPDCX or even DIM. I would like to document a couple of new profiles that look exactly the same but refer to these other metadata schema. Why ? My interest is Sword. When sending a package to the Sword server I need to know exactly what package formats the server supports, including what metadata schema. Any objections ? Cheers, Robin. -- What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Mass replace of PDFs
Hi George: If you are using 1.6.0 + , I'd look at ItemUpdate, instead of ItemImport. You can easily make single metadata field changes, or addition/deletions of individual bitstreams, without wholesale replacement. Consult the doc (chapter 8.5) Thanks, Richard On Dec 20, 2010, at 11:48 AM, Brian Freels-Stendel wrote: Yes, the batch importer can be used with the --replace argument. It would require creating the same files/structure as the regular import, so you can change the titles at the same time. This will keep the same handles, assuming that information is available via the handle file (just another text file that has one line for the handle, only including prefix/item_number.) For DSpace 1.6.x, this info is better presented in chapter 8.3.3. You can also see the exact files you'd need to create by exporting an item. B-- On 12/20/2010 at 7:40 AM, in message c309c260a4cf62418ac9b79b786db6eb429e8b5...@mbxa.exchange.cornell.edu, George Stanley Kozak g...@cornell.edu wrote: Hi... I recently added a collection of about 1500 items (all with PDF bitstreams). The Collection owner contacted me thispast weekend and said that he would like me to do a mass change of the Titles and would like to replace all of the PDFs as well (he wants to retain the handles). I figure that I can use the batch metdata export/import to change the titles, but can I use the batch item importer to replace the PDFs? What I'd have to do is delete the existing PDF of an item and then add a new PDF. Can that be done with the item importer, or will I have to manually delete old bitstreams before adding the new ones? I've never really encountered a case like this before. George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargoed Item Message
Hi Sean: I'm not sure what details you might want to include, but since the embargo information is all carried in standard metadata fields, you could (using whatever UI stack you are on JSPUI, XMLUI) have the UI detect if the item is embargoed (essentially, that just means that there is a liftdate field whose value is later than the time of testing), then do whatever you want in the UI based on that fact: put up some special markup, turn the page red or blinking (with css), etc Even without any special handling, just making sure the terms and/or liftdate fields appear on the item page (through configuration) might provide enough information for the requestor. Is this the sort of thing you want to achieve? Thanks, Richard On Oct 29, 2010, at 6:01 AM, Sean Carte sean.ca...@gmail.com wrote: Currently attempting to access an embargoed bitstream results in 'The file is restricted' (xmlui.BitstreamReader.auth_message) message. I can change the content of that message in messages.xml, of course, but I wonder if it would be possible to include details of the embargo either at that point, or before that at the item record page. Any ideas on whether this would be possible? Sean -- Sean Carte esAL Library Systems Manager +27 72 898 8775 +27 31 373 2490 fax: 0866741254 http://esal.dut.ac.za/ -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Turn embargo off?
Hi Marvin: Since the embargo.field.terms and embargo.field.lift properties don't refer to any real metadata fields(SCHEMA.ELEMENT.QUALIFIER), embargo is turned off by default - you do not need to do anything further. Uncomment all properties that you commented out - the code will produce errors if they aren't present. Thanks and sorry for any confusion - we will add a note in the docs about this case. Richard Rodgers On Oct 7, 2010, at 4:21 PM, Marvin Weaver wrote: I built 1.6.2 with embargo.field.terms = SCHEMA.ELEMENT.QUALIFIER and embargo.field.lift = SCHEMA.ELEMENT.QUALIFIER commented out thinking that would disable embargo. When I try to enter an item I get java.lang.IllegalStateException: Missing one or more of the required DSpace configuration properties for EmbargoManager, check your configuration file. So I commented out the rest of the section on embargo and did an ant update. I still get the error. How can I turn off embargo? Marvin ATT1..cATT2..c -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] PowerPoint Text Extractor
Hi RIchard: I cannot speak to it's quality (and indeed we have had quality issues with other formats) but Apache POI library supports Powerpoint text extraction. I would study the doc at http://poi.apache.org/ for how to use the library, and look in: http://scm.dspace.org/svn/repo/dspace/trunk/dspace-api/src/main/java/org/dspace/app/mediafilter for examples of other extractor media filters. Then post any questions to the tech or dev list. Hope that is helpful, Richard Rodgers On Sep 29, 2010, at 10:14 AM, Jizba, Richard wrote: Hello, Are there plans to add a PPT text extractor to DSpace? In the meantime, can some provide information on how to implement one? Thanks, Richard Jizba Creighton University. ATT1..cATT2..c -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] How to actually get an item embargoed?
On Sep 9, 2010, at 9:41 AM, Mark H. Wood wrote: On Wed, Sep 08, 2010 at 06:18:18PM -0400, Richard Rodgers wrote: If you look at the class DefaultEmbargoSetter (in org.dspace.embargo) the method 'parseTerms' creates the lift date out of what EmbargoManager passes it (which is the contents of the metadata field configured for the 'terms'), and the next method in that class 'setEmbargo' does the setting - which simply consists of removing the read policies on the bitstreams. Does that help, or further confuse? Neither, actually. That much I had worked out. What I was wondering is: how did the terms get set in the first place? The documentation is silent there. To get embargo working, you just need to make sure that the embargo.terms embargo.lift properties are configured, Done, but made no visible difference (except that 'dspace embargo-lifter' no longer complains). and they point to metadata fields that exist, Done, but made no visible difference. and are in input forms, etc Thanks, this is what I was missing: stock DSpace doesn't do anything about marking an item *to be embargoed* (there is no code which inserts embargo terms automagically); I have to add something to the input forms, or (I think?) set up a template item. Correct - embargo terms (and lift, for that matter) are inscribed in ordinary metadata fields - this allows you to utilize all the standard means of assigning metadata to items to set embargoes: either manual entry in web submission [via input forms], use of templates [if terms will always be the same for a collection] or externally generated (if using ItemImport, SWORD, etc) metadata that just appears: there is nothing 'special' about the fields embargo uses except that the setter looks there at item installation time, and the lifter whenever that tool is run. Nothing in the embargo system automatically assigns terms - it only automatically translates them into lift dates based on the logic contained in the setter. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ ATT1ATT2..cATT3..c -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] How to actually get an item embargoed?
Hi Tim: Just a remark below: Richard On Sep 9, 2010, at 11:29 AM, Tim Donohue wrote: I'd actually go one further and say: (1) We should update the manual to make clearer (like Mark suggests) AND (2) We should work to ship 1.7 with a default embargo already setup (i.e. pre-configured) -- so that all you need to do is update I'm not sure what you mean - it *is* already enabled (i.e, the setter lifter are functional) the only thing you need to do is decide on which metadata fields the terms lift will map to. Are you saying we want to legislate those? There is no DC profile etc standard that I'm aware of for embargo terms. Or do you simply mean we should put xml comments in input_forms.xml? Like: !-- make sure the terms appear here -- Can you explain what you mean by pre-configure? input-forms.xml and uncomment the pre-configured embargo field(s). If an institution doesn't like the pre-configured version, they can always modify it to use different fields or have different values, etc. What do others think? Should a pre-configured version be in 1.7? It seems like this question keeps popping up over and over again in different forms (e.g. how do I enable it?, how does it work?, etc.) Might be best to make this easier for everyone with a pre-configured version -- and let them decide if they want to extend it or not. - Tim On 9/9/2010 10:10 AM, Mark H. Wood wrote: I worked over the Javadoc in the embargo package, to improve my understanding and (I hope) to fill in the overall process and requirements a bit. Committed revision 5342. The new package comments might serve as an appropriate starting point for expanding the manual in this area. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] How to actually get an item embargoed?
HI Mark: If you look at the class DefaultEmbargoSetter (in org.dspace.embargo) the method 'parseTerms' creates the lift date out of what EmbargoManager passes it (which is the contents of the metadata field configured for the 'terms'), and the next method in that class 'setEmbargo' does the setting - which simply consists of removing the read policies on the bitstreams. Does that help, or further confuse? To get embargo working, you just need to make sure that the embargo.terms embargo.lift properties are configured, and they point to metadata fields that exist, and are in input forms, etc I'll be glad to elaborate further as needed, Richard On Sep 8, 2010, at 3:47 PM, Mark H. Wood wrote: The documentation lists the configuration values that the org.dspace.embargo package looks at. But there doesn't seem to be any code anywhere in that package, or anywhere else, which actually sets the embargo terms or lift date. Did I miss something? How do I get a submission embargoed using the default setter and lifter? -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ ATT1ATT2..cATT3..c -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] java.lang.outOfMemory error trying to run index-init
Hi Sue: Yes, I saw the post, and thanks for these numbers - I just wanted to make sure there wasn't some issue of scale that was 'hidden' under the raw item count (e.g that your articles were 10 times bigger on avg). And now the indexing time is at least roughly proportional. I haven't studied the behavior of those parameters, so have no specific advice at the moment - but I'll keep your values in mind when I look next at indexing code... Thanks, Richard On Jun 21, 2010, at 9:44 PM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] wrote: Hi Richard, I don’t know if you saw my subsequent post today, but I ended up changing two dspace.cfg parameters and it sped up index-init considerably – it only took a day and a half this time. I’m a bit worried about the impact it’s had on our full-text searching. Since we had a large repository, I had our search.max-clauses set at 200,000 and I changed it to 4096 which is twice the default. I also changed search.maxfieldlength from -1 (unlimited) to 10,000 for the same reason. What do you think? See our numbers below. Thanks a bunch, Sue From: Richard Rodgers [mailto:rrodg...@mit.edu] Sent: Monday, June 21, 2010 1:50 PM To: Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] Cc: dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net; William L Hays Subject: Re: [Dspace-tech] java.lang.outOfMemory error trying to run index-init Hi Sue: I don't have any immediate help, but I'm struck by how long the indexing job is taking. I had a comparison done with one of our DSpace 1.6 repositories which is about half the size of yours (71,481 items), and is mostly text-based content (which I think yours is also?) On not particularly fast hardware, a complete re-index took about 5 hours - not 5 days. There may be some subtle limit in the code based on size - so to get started, I did a 'profile' of our repo with respect to full-text content (which I am assuming accounts for most of the indexing time - but I could be wrong). Here is the 'profile' and the queries we used to get it. I'd be interested to see what your repo looks like using the same metrics. [Sue T.] Our numbers in blue to the right of yours: count of items 71,481[Sue T.] 140,337 count of bitstreams in text extract bundles (TEXT): 89,993[Sue T.] 134,215 sum of all file sizes in text extract bundles:7,695,414,829[Sue T.] 12,804,764,306 average size of text extract bitstream: 85,511[Sue T.] 95,405 Queries used: select count(bs.bitstream_id) from bundle b, bundle2bitstream b2b, bitstream bs where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id and b.namehttp://b.name/ = 'TEXT' select sum(bs.size_bytes) from bundle b, bundle2bitstream b2b, bitstream bs where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id and b.namehttp://b.name/ = 'TEXT' Thanks, Richard On Jun 19, 2010, at 7:50 PM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] wrote: We have a large repository, currently with 140,376 Items. Due to user complaints about search results, we recently turned off stemming in our DSpace 1.5.1 search by commenting out the following line in DSAnalyzer.java: result = new PorterStemFilter(result); Of course then we had to run index-init to rebuild the search indexes and we’ve been having problems getting the job to finish. Due to the size of our repository, index-init takes about 5 or 6 days to complete and now it’s failed twice due to the following error: An unexpected error has been detected by Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 655360 bytes for GrET in /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp. Out of swap space? # # Internal Error (allocation.inline.hpp:42), pid=23486, tid=5 # Error: GrET in /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp # # Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc) # An error report file with more information is saved as: # /dspace/hs_err_pid23486.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Abort - core dumped Can someone please help us with this? This most recent time index-init failed was 4½ days into the index rebuild – after indexing 104,082 out of 140,376 items and now it looks like if we want an accurate and complete index, we’re going to have to start all over again with the rebuild and there’s no guarantee it will finish successfully. Any help would be much appreciated! I’m attaching the core dump and a copy of our DSRUN to this email. Thanks in advance, Sue Sue Walker-Thornton NASA Langley Research Center Integrated Library Systems Developer, Application Database
Re: [Dspace-tech] java.lang.outOfMemory error trying to run index-init
Hi Sue: I don't have any immediate help, but I'm struck by how long the indexing job is taking. I had a comparison done with one of our DSpace 1.6 repositories which is about half the size of yours (71,481 items), and is mostly text-based content (which I think yours is also?) On not particularly fast hardware, a complete re-index took about 5 hours - not 5 days. There may be some subtle limit in the code based on size - so to get started, I did a 'profile' of our repo with respect to full-text content (which I am assuming accounts for most of the indexing time - but I could be wrong). Here is the 'profile' and the queries we used to get it. I'd be interested to see what your repo looks like using the same metrics. count of items 71,481 count of bitstreams in text extract bundles (TEXT): 89,993 sum of all file sizes in text extract bundles:7,695,414,829 average size of text extract bitstream: 85,511 Queries used: select count(bs.bitstream_id) from bundle b, bundle2bitstream b2b, bitstream bs where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id and b.namehttp://b.name/ = 'TEXT' select sum(bs.size_bytes) from bundle b, bundle2bitstream b2b, bitstream bs where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id and b.namehttp://b.name/ = 'TEXT' Thanks, Richard On Jun 19, 2010, at 7:50 PM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] wrote: We have a large repository, currently with 140,376 Items. Due to user complaints about search results, we recently turned off stemming in our DSpace 1.5.1 search by commenting out the following line in DSAnalyzer.java: result = new PorterStemFilter(result); Of course then we had to run index-init to rebuild the search indexes and we’ve been having problems getting the job to finish. Due to the size of our repository, index-init takes about 5 or 6 days to complete and now it’s failed twice due to the following error: An unexpected error has been detected by Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 655360 bytes for GrET in /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp. Out of swap space? # # Internal Error (allocation.inline.hpp:42), pid=23486, tid=5 # Error: GrET in /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp # # Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc) # An error report file with more information is saved as: # /dspace/hs_err_pid23486.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Abort - core dumped Can someone please help us with this? This most recent time index-init failed was 4½ days into the index rebuild – after indexing 104,082 out of 140,376 items and now it looks like if we want an accurate and complete index, we’re going to have to start all over again with the rebuild and there’s no guarantee it will finish successfully. Any help would be much appreciated! I’m attaching the core dump and a copy of our DSRUN to this email. Thanks in advance, Sue Sue Walker-Thornton NASA Langley Research Center Integrated Library Systems Developer, Application Database Administrator ConITS Contract ~ NCI Information Systems, Inc. 130 Research Drive Hampton, VA 23666 Office: (757) 224-4074 ~ Mobile: (757) 506-9903 ~ Fax: (757) 224-4001 email: susan.m.thorn...@nasa.govmailto:susan.m.thorn...@nasa.gov hs_err_pid23486.logATT1.cATT2.c -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] More re: Embargo error on Approval
Hi Jose: Re #1. Date would be safe, but then you would not be able to enter the 'forever' (open ended) value, which is not a date. I would normally just use 'onebox' (smaller than a text-area). Re #2. I'm not positive I understand the question, but the embargo does not hide the metadata or item page - it merely removes 'read' permissions on the bitstreams. Re #3. As noted in another email, this is a 1.6 bug. You can get details about fixing at: http://jira.dspace.org/jira/browse/DS-506 Thanks, Richard R On May 20, 2010, at 3:55 PM, Blanco, Jose wrote: Richard, Thank you very much for this documentation. I’ve been experimenting with it today, and I just have few questions. 1.It seems like making the terms field a date type in the input-forms.xml file would be the safest thing to do, instead of a textarea. Any thoughts on this? 2. I deposited an item and set the terms field to some value in the future. I am able to see the item and when I try to access the bitstream, I am asked to authenticate, is this how it’s suppose to work? 3. I tried running /dspace/bin/dspace embargo-lifter But it can’t seem to find it. I looked around for the file “embargo-lifter” but I can’t find it. This is the error I get: [dsp...@pocarisweat bin]$ ./dspace embargo-lifter Command not found: embargo-lifter Usage: dspace [command-name] {parameters} - checker: Run the checksum checker - checker-emailer: Send emails related to the checksum checker - cleanup: Remove deleted bitstreams from the assetstore - community-filiator: Tool to manage community and sub-community relationships - create-administrator: Create a DSpace administrator account - dsprop: View a DSpace property from dspace.cfg - export: Export items or collections - filter-media: Perform the media filtering to extract full text from docuemnts and to create thumbnails - generate-sitemaps: Generate search engine and html sitemaps - harvest: Manage the OAI-PMH harvesting of external collections - import: Import items into DSpace - index: General index command (requires extra paramters) - index-init: Initialise the search and browse indexes - index-update: Update the search and browse indexes - itemcounter: Update the item strength counts in the user interface - itemupdate: Item update tool for altering metadata and bitstream content in items - make-handle-config: Run the handle server simple setup command - metadata-export: Export metadata for batch editing - metadata-import: Import metadata after batch editing - packager: Execute a packager - registry-loader: Load entries into a registry - stat-general: Compile the general statistics - stat-initial: Compile the initial statistics - stat-monthly: Compile the monthly statistics - stat-report-general: Create the general statistics report - stat-report-initial: Create the general statistics report - stat-report-monthly: Create the monthly statistics report - stats-log-converter: Convert dspace.log files ready for import into solr statistics - stats-log-importer: Import previously converted log files into solr statistics - stats-util: Statistics Client for Maintenance of Solr Statistics Indexes - structure-builder: Build DSpace commnity and collection structure - test-database: Test the DSpace database connection is OK - test-email: Test the DSpace email server settings OK - sub-daily: Send daily subscription notices - update-handle-prefix: Update handle records and metadata when moving from one handle to another From: Richard Rodgers [mailto:rrodg...@mit.edu] Sent: Thursday, May 06, 2010 5:53 AM To: Jizba, Richard Cc: dspace-tech@lists.sourceforge.netmailto:dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] More re: Embargo error on Approval Hi Richard: Try this document for a fuller explanation. Let me know of any questions not addressed in it. We will try to include it in the next release. Thanks, Richard R On May 5, 2010, at 6:02 PM, Jizba, Richard wrote: I finally realized that the Embargo Setter is reading dc.embargo.terms for the date rather than dc.embargo.liftdate. I’ve check our dspace.cfg file and there does not seem to be a mix-up. Nor is there a mix-up on the submission form. Perhaps I don’t understand what these fields are for? What is dc.embargo.terms supposed to do? Richard _ From: Jizba, Richard Sent: Wednesday, May 05, 2010 4:05 PM To: 'dspace-tech@lists.sourceforge.netmailto:'dspace-tech@lists.sourceforge.net' Subject: Embargo error on Approval Hello, We we try to approve an item that has been submitted with an embargo date (in dc.embargo.liftdate) we receive the following error: java.lang.NullPointerException at org.dspace.embargo.EmbargoManager.getEmbargoDate(EmbargoManager.java:166) Can anyone suggest how to resolve this? We are running 1.6. Richard Jizba Creighton University Omaha, NE ATT1.cATT2.c
Re: [Dspace-tech] More re: Embargo error on Approval
Hi Richard: Try this document for a fuller explanation. Let me know of any questions not addressed in it. We will try to include it in the next release. Thanks, Richard R On May 5, 2010, at 6:02 PM, Jizba, Richard wrote: I finally realized that the Embargo Setter is reading dc.embargo.terms for the date rather than dc.embargo.liftdate. I’ve check our dspace.cfg file and there does not seem to be a mix-up. Nor is there a mix-up on the submission form. Perhaps I don’t understand what these fields are for? What is dc.embargo.terms supposed to do? Richard _ From: Jizba, Richard Sent: Wednesday, May 05, 2010 4:05 PM To: 'dspace-tech@lists.sourceforge.netmailto:'dspace-tech@lists.sourceforge.net' Subject: Embargo error on Approval Hello, We we try to approve an item that has been submitted with an embargo date (in dc.embargo.liftdate) we receive the following error: java.lang.NullPointerException at org.dspace.embargo.EmbargoManager.getEmbargoDate(EmbargoManager.java:166) Can anyone suggest how to resolve this? We are running 1.6. Richard Jizba Creighton University Omaha, NE ATT1.cATT2.c {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf250 {\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fmodern\fcharset0 Courier;} {\colortbl;\red255\green255\blue255;} \margl1440\margr1440\vieww18820\viewh11900\viewkind0 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural \f0\fs24 \cf0 Embargo Support in DSpace 1.6\ \ i. What is an embargo?\ \ An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed timeframes.\ \ II.Embargo model and life-cycle\ \ Functionally, the embargo system allows you to attach 'terms' to an item before it is placed into the repository, which express how the embargo should be applied. What do 'we mean by terms' here? They are really any expression that the system is capable of turning into (1) the time the embargo expires, and (2) a concrete set of access restrictions. Some examples:\ \ 2020-09-12 - an absolute date (i.e. the date embargo will be lifted)\ 6 months - a time relative to when the item is accessioned\ forever - an indefinite, or open-ended embargo\ local only until 2015 - both a time and an exception (public has no access until 2015, local users OK immediately)\ Nature Publishing Group standard - look-up to a policy somewhere (typically 6 months)\ \ These terms are 'interpreted' by the embargo system to yield a specific date on which the embargo can be removed or 'lifted'., and a specific set of access policies. Obviously, some terms are easier to interpret than others (the absolute date really requires none at all), and the 'default' embargo logic understands only the most basic terms (the first and third examples above). But as we will see below, the embargo system provides you with the ability to add in your own 'interpreters' to cope with any terms expressions you wish to have. This date that is the result of the interpretation is stored with the item and the embargo system detects when that date has passed, and removes the embargo (lifts it), so the item bitstreams become available. Here is a more detailed life-cycle for an embargoed item:\ \ A. Terms assignment\ \ The first step in placing an embargo on an item is to attach (assign) 'terms' to it.\ If these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. The terms are not immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage of the life-cycle. Thus a submitter could enter one value, and a collection editor replace it, and only the last value will be used. Since metadata fields are multivalued, theoretically there can be multiple terms values, but in the default implementation only one is recognized.\ \ B. Terms interpretation/imposition\ \ In DSpace terminology, when an Item has exited the last of any workflow steps (or if none have been defined for it), it is
Re: [Dspace-tech] Embargo setter plugin question
DayTableEmbargoSetter.class Description: DayTableEmbargoSetter.class DayTableEmbargoSetter.java Description: DayTableEmbargoSetter.java Hi Jason: I haven't tested it, but here's a setter that might do what you want. I include both the source and class files (just put the latter in your classpath to try it out). To set it up do the following: (1) add a property in dspace.cfg embargo.terms.days = 6 months:182,1 year:365, 2 years:730 (you can use any language you want on the left-hand side: '6 months' can be 'ProQuest min' or whatever, it is the number of days after the ':' that matters) (2) In input_forms.xml, add a new 'value-pairs' element like: value-pairs value-pairs-name=UMI_embargo dc-term=embargo pair displayed-value6 months/displayed-value stored-value6 months/stored-value /pair pair displayed-value1 year/displayed-value stored-value1 year/stored-value /pair . Now, for whatever field you configured as embargo.field.terms, change it to use a 'dropdown' to that value-pair defined above. Let me know if this works for you, and if so, we can include it in the next release, as an option for all. Thanks, Richard R On May 3, 2010, at 11:27 AM, Jason Fowler wrote: Has anyone yet developed a custom Embargo setter for 1.6 that will mimic UMI's embargo terms (6 months, 1 year, 2 years). If so, would you be willing to share the code? Thanks, Jason Fowler, CA, MSLS Archives and Special Collections Librarian The Southern Baptist Theological Seminary Vice President, ALABI jfow...@sbts.edu -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargo setter plugin question
Hi Jason: Bit of an email glitch in my last reply: looks like the text became an attachment. But the gist is: I sent you a setter class (source and .class file) that I haven't tested, but might do what you are looking for. Let me know if you have any problems, or if the set-up description is not clear. It's fairly general, so if it works for you, we can include it in the next distribution. Thanks, Richard R On May 3, 2010, at 11:27 AM, Jason Fowler wrote: Has anyone yet developed a custom Embargo setter for 1.6 that will mimic UMI's embargo terms (6 months, 1 year, 2 years). If so, would you be willing to share the code? Thanks, Jason Fowler, CA, MSLS Archives and Special Collections Librarian The Southern Baptist Theological Seminary Vice President, ALABI jfow...@sbts.edu -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] embargo in 16.
Hi Jose: We are still working on improving the doc. I attach a draft that might answer many of your questions. But briefly, yes, you need to create any new metadata fields you want to use for embargo, both in the metadata registry, and place them in input-forms.xml If you don't want embargo info recorded for certain collections, then only add them in input_forms to the collections you want. E.g, you could leave these fields out of the 'default' form, and create a 'embargo' form that includes them. Then, configure the map at the beginning of input_forms to point each collection you want to have use embargo to that 'embargo' form. Hope this helps, Richard R {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf250 {\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fmodern\fcharset0 Courier;} {\colortbl;\red255\green255\blue255;} \margl1440\margr1440\vieww18820\viewh11900\viewkind0 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural \f0\fs24 \cf0 Embargo Support in DSpace 1.6\ \ i. What is an embargo?\ \ An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed timeframes.\ \ II.Embargo model and life-cycle\ \ Functionally, the embargo system allows you to attach 'terms' to an item before it is placed into the repository, which express how the embargo should be applied. What do 'we mean by terms' here? They are really any expression that the system is capable of turning into (1) the time the embargo expires, and (2) a concrete set of access restrictions. Some examples:\ \ 2020-09-12 - an absolute date (i.e. the date embargo will be lifted)\ 6 months - a time relative to when the item is accessioned\ forever - an indefinite, or open-ended embargo\ local only until 2015 - both a time and an exception (public has no access until 2015, local users OK immediately)\ Nature Publishing Group standard - look-up to a policy somewhere (typically 6 months)\ \ These terms are 'interpreted' by the embargo system to yield a specific date on which the embargo can be removed or 'lifted'., and a specific set of access policies. Obviously, some terms are easier to interpret than others (the absolute date really requires none at all), and the 'default' embargo logic understands only the most basic terms (the first and third examples above). But as we will see below, the embargo system provides you with the ability to add in your own 'interpreters' to cope with any terms expressions you wish to have. This date that is the result of the interpretation is stored with the item and the embargo system detects when that date has passed, and removes the embargo (lifts it), so the item bitstreams become available. Here is a more detailed life-cycle for an embargoed item:\ \ A. Terms assignment\ \ The first step in placing an embargo on an item is to attach (assign) 'terms' to it.\ If these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. The terms are not immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage of the life-cycle. Thus a submitter could enter one value, and a collection editor replace it, and only the last value will be used. Since metadata fields are multivalued, theoretically there can be multiple terms values, but in the default implementation only one is recognized.\ \ B. Terms interpretation/imposition\ \ In DSpace terminology, when an Item has exited the last of any workflow steps (or if none have been defined for it), it is said to be 'installed' into the repository. At this precise time, the 'interpretation' of the terms occurs, and a computed 'lift date' is assigned, which like the terms is recorded in a configurable metadata field. It is important to understand that this interpretation happens only once, (just like the installation), and cannot be revisited later. Thus, although an administrator can assign a new value to the metadata field holding the terms after the item has been installed, this will have no effect on the
Re: [Dspace-tech] Embargoes in 1.6
Hi Jason: One thought: have you added those fields to your input-forms.xml? In other respects, the embargo fields behave just like any other metadata, and can be added to the default set - or any collection-specifc set - of metadata fields used in web submission. The tech doc has instructions on managing input-forms. Hope this helps, Richard R On Mar 31, 2010, at 10:02 AM, Jason Fowler wrote: Hi all, I am testing DSpace 1.6 before we migrate to it. I believe I have the dspace.cfg set up correctly, and I have added dc.embargo.terms and dc.embargo.liftdate to my metadata registry. However, nothing shows up in the submission workflow in any of my xmlui themes, or in jspui. Are there additional changes that I need to make in order to enable this within the workflow? Can someone explain what those are? Thanks, Jason Fowler, CA, MSLS Archives and Special Collections Librarian The Southern Baptist Theological Seminary Vice President, ALABI jfow...@sbts.edu -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargoes in 1.6
Jason: Yes, that's right - input-forms must be updated. But I regard it on balance an advantage that the embargo system 'inherits' all the configurability of standard DSpace metadata - not just in submission, but indexing, display and more (so you could, e.g. do fielded search on embargo dates, or suppress display in the 'simple' item page, etc). However, this flexibility does add additional configuration steps initially, as you observe. Thanks, Richard On Mar 31, 2010, at 10:24 AM, Jason Fowler wrote: Richard, That's helpful. So just to verify, it is necessary to add those items to the input-forms.xml in order to get them to show up in the submission process? Embargoes don't just work (ie. show up in the submission process) out of the box? I am fine with making the changes, but I just want to make sure they are actually necessary. Thanks, Jason -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargoes in 1.6
Hi Hilton: Not really - the assumption of the embargo system is that once content has been made generally available, it no longer makes sense to embargo it (since embargo traditionally means withholding exposure until some future time, then exposing it). So it really isn't the same as access restriction, which can be accomplished with standard DSpace administrative tools [and admin changes can me made anytime]. Does that make sense? Thanks, Richard On Mar 31, 2010, at 1:20 PM, Hilton Gibson wrote: Hi All With respect to older items already submitted, can one impose an embargo on them without a migration and re-submission ? Cheers hg. On 31 March 2010 16:21, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Hi Jason: One thought: have you added those fields to your input-forms.xml? In other respects, the embargo fields behave just like any other metadata, and can be added to the default set - or any collection-specifc set - of metadata fields used in web submission. The tech doc has instructions on managing input-forms. Hope this helps, Richard R On Mar 31, 2010, at 10:02 AM, Jason Fowler wrote: Hi all, I am testing DSpace 1.6 before we migrate to it. I believe I have the dspace.cfg set up correctly, and I have added dc.embargo.terms and dc.embargo.liftdate to my metadata registry. However, nothing shows up in the submission workflow in any of my xmlui themes, or in jspui. Are there additional changes that I need to make in order to enable this within the workflow? Can someone explain what those are? Thanks, Jason Fowler, CA, MSLS Archives and Special Collections Librarian The Southern Baptist Theological Seminary Vice President, ALABI jfow...@sbts.edumailto:jfow...@sbts.edu -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Systems Administrator Library and Information Services Stellenbosch University http://www.sun.ac.zahttp://www.sun.ac.za/ http://library.sun.ac.zahttp://library.sun.ac.za/ http://scholar.sun.ac.zahttp://scholar.sun.ac.za/ http://ubuntu.sun.ac.zahttp://ubuntu.sun.ac.za/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargoes in 1.6
Hi Jason: I can see that this might be confusing, so let me try to explain a little more clearly. At the most basic level, the field containing the 'terms' is where a submitter specifies how the embargo should work for that item. At the time of installation into the archive (i.e. when it exits workflow) those terms are 'interpreted' into a specific date in the future, which is then stored in the 'liftdate' field. The lifter then only checks this latter date, and ignores the original 'terms'. What can the terms be? Conceivably anything, (e.g. 60 days, my standard policy, Elsevier terms, etc), but of course the software cannot interpret arbitrary words. That's why the embargo system allows you to write your own code that interprets whatever terms you want to use, and will run that code at the appropriate time. However, 'out of the box', DSpace supplies code for the simplest case: that in which the terms are actual dates themselves. In this case, all it does is copy the date you put in 'terms' into 'liftdate'. Over time, as we discover other common 'terms', we can add to the DSpace 'library' of term-interpreting code, and distribute them with each new release. Finally, (to really confuse things), you can actually configure the 'terms' dc field to be the same as the 'liftdate' field! That is, make 'dc.embargo' or whatever you want serve as both terms and lift. In this case, it doesn't even copy a field, it just uses the date 'in place'. This is really a sensible configuration, if you don't want to keep a permanent record of what the terms were. Does that illuminate things a bit? Richard On Mar 31, 2010, at 1:59 PM, Jason Fowler wrote: That makes sense. The only thing I don't understand is how exactly the liftdate is computed. Is that something that happens automatically, or do I need to configure that? Jason Fowler, CA, MSLS From: Claudia Juergen [claudia.juer...@ub.tu-dortmund.de] Sent: Wednesday, March 31, 2010 12:14 PM To: Jason Fowler Cc: claudia.juer...@ub.tu-dortmund.de Subject: RE: [Dspace-tech] Embargoes in 1.6 Hello Jason, only the embargo terms afaik. The liftdate is computed based upon the terms. Claudia Thanks, Claudia! Just one more question. Do I need to add both dc.embargo.terms AND dc.embargo.liftdate to the input-forms.xml? Thanks in advance, Jason Fowler, CA, MSLS -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargoes in 1.6
Hi Hilton: I'm not sure I precisely follow option b (especially step 9 - not sure what 'read access permission problems' are), but something along these lines ought to work. Specifically, if by some means you remove read policies on the bitstreams, and make sure there is a valid date in the field configured for the liftdate, then regularly running the lifter will lift the embargo on those items at the time you set. You don't really even need any values in the 'terms' field at all, since the lifter doesn't check or use it. The only rub I can see is that each item would have to be manually assigned a lift date and have it's read policies removed - the embargo stuff doesn't have batch tools for this. Does this address your question? Richard On Mar 31, 2010, at 2:59 PM, Hilton Gibson wrote: On 31 March 2010 20:35, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Hi Hilton: Not really - the assumption of the embargo system is that once content has been made generally available, it no longer makes sense to embargo it (since embargo traditionally means withholding exposure until some future time, then exposing it). So it really isn't the same as access restriction, which can be accomplished with standard DSpace administrative tools [and admin changes can me made anytime]. Does that make sense? Yes. But and there is always a but. What we did before 1.6.0 was to remove the binary object. Now we want to restore them but with an embargo. How do we do that ? See: http://ir.sun.ac.za/wiki/index.php/Asset_Embargo. Check procedure - option b. Will this work ? Cheers hg. Thanks, Richard On Mar 31, 2010, at 1:20 PM, Hilton Gibson wrote: Hi All With respect to older items already submitted, can one impose an embargo on them without a migration and re-submission ? Cheers hg. On 31 March 2010 16:21, Richard Rodgers rrodg...@mit.edumailto:rrodg...@mit.edu wrote: Hi Jason: One thought: have you added those fields to your input-forms.xml? In other respects, the embargo fields behave just like any other metadata, and can be added to the default set - or any collection-specifc set - of metadata fields used in web submission. The tech doc has instructions on managing input-forms. Hope this helps, Richard R On Mar 31, 2010, at 10:02 AM, Jason Fowler wrote: Hi all, I am testing DSpace 1.6 before we migrate to it. I believe I have the dspace.cfg set up correctly, and I have added dc.embargo.terms and dc.embargo.liftdate to my metadata registry. However, nothing shows up in the submission workflow in any of my xmlui themes, or in jspui. Are there additional changes that I need to make in order to enable this within the workflow? Can someone explain what those are? Thanks, Jason Fowler, CA, MSLS Archives and Special Collections Librarian The Southern Baptist Theological Seminary Vice President, ALABI jfow...@sbts.edumailto:jfow...@sbts.edu -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Systems Administrator Library and Information Services Stellenbosch University http://www.sun.ac.zahttp://www.sun.ac.za/ http://library.sun.ac.zahttp://library.sun.ac.za/ http://scholar.sun.ac.zahttp://scholar.sun.ac.za/ http://ubuntu.sun.ac.zahttp://ubuntu.sun.ac.za/ -- Systems Administrator Library and Information Services Stellenbosch University http://www.sun.ac.zahttp://www.sun.ac.za/ http://library.sun.ac.zahttp://library.sun.ac.za/ http://scholar.sun.ac.zahttp://scholar.sun.ac.za/ http://ubuntu.sun.ac.zahttp://ubuntu.sun.ac.za/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw
Re: [Dspace-tech] Question about setting embargoes
Hi George: A couple of observations: first, the dc.embargo.terms only get 'applied' when an item is installed into the repository - it will have no effect on items already in the repo. So to test, create a new Thesis, and submit it via the web submission UI (or via batch, etc): be sure that the 'terms' field has a reasonable date in the future. Then when the item is installed (i.e. exits workflow, if any is defined), you should see that the embargo has been applied. Second, there was an omission in the new DSpace launcher that left out the Embargo lifter. But see http://jira.dspace.org/jira/browse/DS-506 for a way to fix that. THe next bug fix release will include this change Thanks, Richard R On Mar 26, 2010, at 3:02 PM, George Stanley Kozak wrote: Hi… I have a question about using the new Embargo feature. Previous to DSpace 1.6, I used an altered version of the code developed by Terry Owen (U of Maryland). Now I am trying to use the new Embargo feature. I created two fields: dc.embargo.terms and dc.embargo.liftdate and added them to an existing Thesis in my test system. I was not able to have the embargo applied. Is there any documentation that shows some concrete examples of setting these parameters and getting the embargo working? Also, when I try to run /dspace/bin/dspace embargo-lifter –c, I get the error: “Command not found: embargo-lifter”. I did uncomment things in dspace.cfg for the embargo. Any ideas would be appreciated ;-) George Kozak Digital Library Specialist Division of Library Information Technologies (DLIT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 ATT1.cATT2.c -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Append bitstream to existing item
Hi Gary: You didn't specify which version of DSpace you are using, but for the just released 1.6 version the answer is way, mate using the ItemUpdate tool (see the doc). Hope this helps, Richard R. On Mar 14, 2010, at 7:00 PM, Gary Browne wrote: Hi all, I asked this question in 2007 with no replies. But I'd still like an answer, even if it's No way mate.. I'd like to append a bitstream to an existing item, without removing any existing bitstreams. The add and replace options of the ItemImport class don't seem to cover this useful possibility. I'd like to be able to do this programmatically if possible. Thanks Gary GARY BROWNE | Development Programmer Library IT Services | University Library THE UNIVERSITY OF SYDNEY Level 1, Fisher Library F03 | The University of Sydney | NSW | 2006 T +61 2 9351 5946 | F +61 2 9036 E gary.bro...@sydney.edu.au | W http://sydney.edu.au Sent from my plain old desktop computer. CRICOS 00026A This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments. Please think of our environment and only print this e-mail if necessary. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] creative commons icon
Hi Jose: Yes, the display you cite is non-optimal (compared to earlier 1.4 behavior). There is an improvement forthcoming in DSpace 1.6 based on better mime-typing of the license bitstreams, and after that, we hope to completely redo CC (using webservice, rather than Iframes, restore the icon, and much more). Stay tuned, Richard On Jan 27, 2010, at 1:39 PM, Blanco, Jose wrote: I saw an item that has a creative license at the MIT Dspace instance, but it did not look quite right. There is no icon, and the link sends you to the source of a page. http://dspace.mit.edu/handle/1721.1/39138 -Original Message- From: Blanco, Jose [mailto:blan...@umich.edu] Sent: Wednesday, January 27, 2010 1:15 PM To: Dorothea Salo; Dspace Tech Subject: Re: [Dspace-tech] creative commons icon Dorothea, I've implemented the patch and I'm not seeing the cc icon. I looked at ItemTag.java to see how the icon gets displayed, but I don't see how it works there. I will check the jsp files now. Any ideas? Also, can you give me and example of a content file that imports a CC item. Thank you again, Jose -Original Message- From: Dorothea Salo [mailto:dorothea.s...@gmail.com] Sent: Wednesday, January 27, 2010 11:29 AM To: Dspace Tech Subject: Re: [Dspace-tech] creative commons icon Also, I plan to load several of these cc items using ItemImporter. Is there anything I should know about loading cc items using ItemImporter. I'm not quite sure how I'm going to do it yet. I'm thinking I would just include the appropriate files in the folder and list them in the contents file. Is this the way to do it, or is there a better way? That's the way to do it -- just make sure the files are in the CC-LICENSE bundle. Dorothea -- Dorothea Salods...@library.wisc.edu Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Embargo in 1.6
Hi Stuart: I'll take a crack at some of your questions: see remarks inline below. Thanks, Richard R On Tue, 2009-12-08 at 17:40 +1300, stuart yeates wrote: I have some questions about the Embargo plugin in 1.6. I'm basing this on http://wiki.dspace.org/index.php/Embargo_1.6 and trolling through the subversion repository (dspace-1.6.0-rc1 tag). We'd like to have an drop-down box in our self-deposit which allows users to select an embargo period (probably 3, 6, 12, 18 or 24 months). This then gets put in the metadata field pointed to by embargo.field.terms (probably 'VUW.embargo'), and the date of uplift calculated and stored in that pointed to by embargo.field.lift (probably 'available'). As far as submission goes, you could do this fairly easily by using the configurations available in 'input-forms.xml' by defining the drop-down list and it's values. See the doc for more information (all XML, no Java) The DefaultEmbargoSetter automatically sets the default permissions so that while the item metadata for an embargoed item is globally readable, the bitstreams are inaccessible to everyone but admins. [This could be overridden to a less strict lockdown by overriding the setEmbargo method, we havent' thrashed this out yet] DefaultEmbargoSetter also calculates the embargo.field.lift date, from the embargo.field.terms and the current time/date. Once a day the EmbargoManager runs and uplifts items whose embargo has expired. Uplifting involves setting the permissions to whatever the default permissions are for the collection it's in, making item's bitstreams public. My questions are: [1] Does the above sound sane? Yes, that's the general flow of embargo processing, and what you want to do falls generally within it. [2] Is there any way to generate notifications of lifting? The easiest thing I can see would be to do a search for the embargo.field.terms field, sorting on the availability, and supply that as an RSS feed. You could do something like that, but you could also directly act when the embargo is lifted. On the wiki page you cite, look at the Downloads section of the Prototype implementation for 'Embargo-1.6-new.zip'. This contains some example code from Harvard that extends the EmbargoLifter to send an email to the submitter whenever an embargoed item is lifted. This could give you an idea of how to do something similar (you didn't characterize how the notification target(s) would be identified, so I can't be more specific). [3] We're considering how to direct users to other sources for the item when it's currently embargoed. This would probably be involve displaying a block of text which might be inviting them to login and giving them alternative access routes to the item. Has anyone done this? This might be a bit more involved, especially if (as seems likely by your description) the text and access route would be item-specific. Having said this, if you could represent it in a metadata field or fields that would be entered at submission, your Lifter could programmatically remove them when the embargo ends. [4] In the wiki at http://wiki.dspace.org/index.php/Embargo_1.6, if I am reading things correctly, the second to last option in the config file snippet is missing the relevant default option and the last option has it truncated. Am I reading it correctly? The wiki page does look a little truncated: go with what you find in the dspace.cfg that ships with the 1.6 distribution and current tech doc. Due to my technical skills, I'd prefer options that involve XSLT to those involving Java :) Except where noted above in [2] and [3] above about customizing the Lifter, the only Java coding you would need to do is in the EmbargoSetter (to interpret your fixed intervals into DSpace dates), but this is fairly simple (very rough code, just an example): [in 'parseTerms' method] ... if (3 months.equals(terms)) { long monMillis = 90 * 24 * 60 * 60 * 1000; return new DCDate(new Date(System.currentTimeMillis() + monMillis)); } else if (6 months.equals(terms)) { . cheers stuart -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] HELP with Implementing SRB or an alternative in DSpace 1.5.1
Hi Sue: See remarks inline below, but the general answer is that the SRB extension was not designed to partition storage along collection lines, so I don't think it would help you out without a fair bit of additional work. Also, SRB has been replaced with a new platform called iRods (http://www.irods.org) which should support the SRB DSpace extensions, but I'm not aware of any sites to point you to who have implemented it. Thanks, Richard On Wed, 2009-11-18 at 17:27 -0600, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] wrote: Hi, We have some unique collection access/authorization issues and we’re trying to figure out if there’s a way to implement the basic scenario depicted in the diagram below. What we’re trying to avoid is having to implement the DSpace application on two servers, and having to deal with loading documents and metadata onto two separate servers. ***Note: We cannot store these documents on the same server. 1. Is implementing SRB in DSpace simply a matter of correctly configuring the dspace.cfg file? Is there any separate software you need to install on your server? Yes, you need to install the SRB server itself (or have access to one configured elsewhere) 2. Has anyone actually implemented SRB in DSpace and, if so I would like to see a “real” example of how you configured all the “srb.*.assetstore#” parameters in dspace.cfg? The documentation does not say much about how these parameters should be configured, for example I have no idea what to use for “srb.mcatzone”, “srb.mdasdomainname”, or “srb.defaultstorageresource”….?? Most of these properties make more sense if you have a SRB server configured. 3. Is there any plan to configure a future release of dspace so that you can control which assetstore an Item/Document get written to based on Collection id? Ultimately we’d like to do this so we wouldn’t have to keep manipulating assetstore.incoming based on which Collection was being loaded. Not that I know, but you might want to open a JIRA to capture the request 4. Can anyone suggest any alternatives? We’ve also looked at “registration”, but that also duplicates metadata in two separate instances of DSpace from what I understand. Thanks in advance, Sue Sue Walker-Thornton ConITS Contract NASA Langley Research Center Integrated Library Systems Application Database Administrator 130 Research Drive Hampton, VA 23666 Office: (757) 224-4074 Fax:(757) 224-4001 Pager: (757) 988-2547 Email: susan.m.thorn...@nasa.gov -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Dwell with Dspace
Hi Steve: DWell is a configuration of the Longwell metadata faceted browser developed by the SIMILE project at MIT (http://simile.mit.edu). The basic way the 2 systems interoperate is described at: http://simile.mit.edu/wiki/Dwell Essentially, you harvest the DSpace metadata via OAI, transform it into the RDF format Dwell wants, and then load into a Longwell instance (it's own web server). Let me know if you have any further questions. DWell is not under active development at the moment (since the SIMILE project ended), but the source is available and it ought to work with recent versions of DSpace fairly well. Thanks, Richard Rodgers On Wed, 2009-09-30 at 11:59 -0500, Williams, Steven D wrote: Does anyone have any information on Rich Metadata for Dspace with Dwell? I have located the following page http://www.dspace.org/new-user-training/Rich-Metadata-for-DSpace-with-DWell.html. I also located a few comments in the dspace-tech list archive, but nothing informative. Any advice or experience using DWell would be appreciated. Thanks Steve Williams Technology Integration Services University of Texas Libraries University of Texas at Austin PCL 1.128 G -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace Opensearch support
Hi Mika: It will be committed shortly - I will post preliminary documentation as an attachment to the JIRA issue (DS-324), while 1.6 doc is being prepared. Thanks, Richard mikan.d.dspace listmail wrote: *Reposting this* What is the state of OpenSearch for DSpace 1.6? Is it already committed to trunk version and if so, how can I access the search interface. Thanks, Mika 2009/6/5 Richard Rodgers rrodg...@mit.edu: Mika Alexandre: There is a widely adopted set of conventions for expressing search results in standard formats called OpenSearch - http://www.opensearch.org Mark Wood and I wrote an implementation for DSpace that includes RSS and Atom, and is available on both the JSP and XML UIs. We hope to have it included in the 1.6 release if there is interest from the community in doing so. Thanks, Richard mikan.d.dspace listmail wrote: Alexandre, After my post I got a reply from Urban Andersson at Univerisy of Gothenburg, where they have developed such a utility. With their help we have implemented a working search with XML results in DSpace. I think feature like this should definelety be included in default DSpace installation. Below is the reply I got. Im sure Urban will share the code with the community if needed. Cheers, Mika -- Hello Mika, I have made a modification (copy) of the SimpleSearchServlet, and corresponding JSP, to have it return XML (not actual RSS, but that should be easy to adapt) instead of HTML. This is a small fix that should be easy to implement. Although I am not sure if it is sufficient for you. An example of this: http://gupea.ub.gu.se:8080/dspace/simple-search-snabbsok?query=helsinkirpp=50submit=Go We use this for a quick search function on our web pages: http://www.ub.gu.se/ Also we generate various DSpace RSS feeds directly from postgres SQL queries on our web pages, but that is done outside of DSpace. I have not done anything with the RSS class in DSpace. There might be some cache issues etc that you would want to look at when implementing RSS (my solution simply returns straight XML instead of the web page). Please let me know if you want to have at look at the above. / Urban - Näytä lainattu teksti - mikan.d.dspace listmail wrote: - Näytä lainattu teksti - Hi, Has anyone tried/done the following on DSpace: modify the search code so, that it would return any given search as RSS feed? This shouldn't be too hard to implement, since DSpace already contains the functionality to provide RSS feeds from a set of items. What Im hoping to achieve, is to be able to create customized feeds to author/institutional webpages (or live bookmarks) with ease. This would actually allow a creation of virtual collections of any kind very easily. If this hasnt been done before, could someone throw in a few hints on where/how to start experiencing? On a general level, Im thinking of adding a RSS parameter to search query, which would throw the search results to the class/function that creates RSS feeds and returns them to the user. Maybe a parameter for a number of returned items could be used as well. I'd like to know where can I find: 1) Class that runs/processes search and results 2) Class that creates RSS feeds Thanks for any tips, Mika -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Return search query as RSS feed
Mika Alexandre: There is a widely adopted set of conventions for expressing search results in standard formats called OpenSearch - http://www.opensearch.org Mark Wood and I wrote an implementation for DSpace that includes RSS and Atom, and is available on both the JSP and XML UIs. We hope to have it included in the 1.6 release if there is interest from the community in doing so. Thanks, Richard mikan.d.dspace listmail wrote: Alexandre, After my post I got a reply from Urban Andersson at Univerisy of Gothenburg, where they have developed such a utility. With their help we have implemented a working search with XML results in DSpace. I think feature like this should definelety be included in default DSpace installation. Below is the reply I got. Im sure Urban will share the code with the community if needed. Cheers, Mika -- Hello Mika, I have made a modification (copy) of the SimpleSearchServlet, and corresponding JSP, to have it return XML (not actual RSS, but that should be easy to adapt) instead of HTML. This is a small fix that should be easy to implement. Although I am not sure if it is sufficient for you. An example of this: http://gupea.ub.gu.se:8080/dspace/simple-search-snabbsok?query=helsinkirpp=50submit=Go We use this for a quick search function on our web pages: http://www.ub.gu.se/ Also we generate various DSpace RSS feeds directly from postgres SQL queries on our web pages, but that is done outside of DSpace. I have not done anything with the RSS class in DSpace. There might be some cache issues etc that you would want to look at when implementing RSS (my solution simply returns straight XML instead of the web page). Please let me know if you want to have at look at the above. / Urban - Näytä lainattu teksti - mikan.d.dspace listmail wrote: - Näytä lainattu teksti - Hi, Has anyone tried/done the following on DSpace: modify the search code so, that it would return any given search as RSS feed? This shouldn't be too hard to implement, since DSpace already contains the functionality to provide RSS feeds from a set of items. What Im hoping to achieve, is to be able to create customized feeds to author/institutional webpages (or live bookmarks) with ease. This would actually allow a creation of virtual collections of any kind very easily. If this hasnt been done before, could someone throw in a few hints on where/how to start experiencing? On a general level, Im thinking of adding a RSS parameter to search query, which would throw the search results to the class/function that creates RSS feeds and returns them to the user. Maybe a parameter for a number of returned items could be used as well. I'd like to know where can I find: 1) Class that runs/processes search and results 2) Class that creates RSS feeds Thanks for any tips, Mika -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Why the DSpace checksum checker?
Hi Andrew: Here's a slightly different perspective that might help illuminate the checker and it's rationale. While I concur with Mark that there are engineering issues with the implementation, I think it's a mistake to view it as a *file* integrity system (for which - as Mark rightly observes - there are better tools available). The checker was meant to be a *content* integrity system used in support of preservation. I'll explain the difference: When content is ingested into DSpace, an association is forged between metadata, licenses, etc and various content files. The only reliable identifier for the latter are their checksums, since their names and many other attributes are file-system relative or not unique, etc. A legitimate question to put to a system that manages such resources over archival lengths of time is this: how do I know that the ingested files (as described by some metadata) are the same as what you are providing now (Tom De Mulder's question)? The checker is a tool that is supposed to raise our confidence in the answer the system provides, by periodically comparing the checksum recorded at ingest with current asset-store values. Of course, if the file system becomes corrupt, there is much greater likelihood that there will be an I/O failure, or other symptom, than an incorrect checksum reported: that isn't the case the checker was primarily designed for. Rather, the checker supposes that over longer periods of time, content files will be moved from one storage device to another, from local disk to SAN to the cloud, from spinning disk to holographic cell, etc. At any of these junctures, (or even doing routine maintenance), due to mistake or malice, a different file can be substituted for the intended one. And the checker should eventually detect this condition. Of course the checker is not an especially high security content integrity service (HP Labs has done work in this area, and has a system that operates with DSpace) - a wily, resourceful opponent could hack into the database and alter the original checksum along with the content. And of course if your repository has no concerns about long-term integrity, you should give the checker a pass. Richard R On Fri, 2009-04-17 at 01:10 -0700, Mark Diggory wrote: Andrew, As a commiter, I have to be careful that my opinion may be construed as the viewpoint of the DSpace developers. So I will clarify that this is only my opinion, not the groups. I've never been impressed with the reasoning behind this addition to DSpace, it mistakes bitstream security and file corruption as something that should be tracked by the DSpace application. We encountered problems with the checksum checker getting bogged down due to some issue in the code/database. I was never able to get it restarted and continued to waste time on it until our IT Systems Admin showed me the light... A real file integrity system should be implemented outside of the application by an experienced system administrator vested in maintaining the security and integrity of the system, not in the application by a webapplication developer. I do value and respect the team that developed this addon to DSpace, but disagree with the approach and the complexity of the code. Instead I would recommend running something more professional like the following on ones assetstore. http://www.sentry-go.com http://www.cs.tut.fi/~rammer/aide.html http://www.tripwire.com http://www.solidcore.com/ Cheers, Mark On Fri, Apr 17, 2009 at 12:10 AM, Andrew Marlow marlow.and...@googlemail.com wrote: Hello DSpacers, I came across the checksum checker recently and I don't understand why it is useful. Here is what I found on the WIKI:- --- DSpace now comes with a Checksum Checker script ([dspace]/bin/checker) which can be scheduled to verify the checksum of every item within DSpace. Since DSpace calculates and records the checksum of every file submitted to it, this script is able to determine whether or not a file has been changed (either manually or by some sort of corruption or virus). --- So why would an item be corrupt? Or altered manually? I know that any filesystem object can get problems when the filesystem goes wobbly, that's why we have backups. But surely the normal operating system monitoring utilities will tell us when a filesystem needs repair? Can someone explain please? -- Regards, Andrew M. http://www.andrewpetermarlow.co.uk -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ DSpace-tech mailing list
Re: [Dspace-tech] Java Heap dumps during Filter-Media
At MIT we came up with a similar approach, which takes some of the grunt work out of managing the skips. We extended MediaFilter to detect PDFBox (or other) exceptions, then automatically record their handles to a skip list, which is used for any subsequent runs. We'd be glad to give you the code or just put it into the next 1.5.X release. Thanks, Richard R Quoting Tim Donohue tdono...@illinois.edu: Jeffrey, I've seen this same issue all to many times to count. From what I've noticed it seems that the PDFBox software (which DSpace uses) occasionally has difficulties with larger PDFs (usually 7MB or larger) which included OCRed, scanned images. I've never encountered this problem with PDFs created directly from digital files (like Word, etc.)... From what I've seen, occasionally recreating the PDF will resolve the problem...but, more often than not even that doesn't help. The problem seems to be more of an issue with how PDFBox loads the content into memory. Locally, I've only come up with two possible solutions: (1) Increase the memory available to the 'filter-media' script (by bumping up the -Xmx value in the '[dspace]/bin/dsrun' script). This works for some PDFs, but others will continue to have problems (as PDFBox seems to use up enormous amounts of memory for some PDFs). (2) Force those problematic PDFs to be skipped over by the 'filter-media' script (by using the -s flag): To make this easier on myself, I've started maintaining a filter-skiplist file which lists all the handles of the problematic PDFs (so far we've encountered 35 of them), with a separate handle on each line. Then, I pass this filter-skiplist file to the cronjob which runs 'filter-media' like so: 0 2 * * * filter-media -s `less filter-skiplist | tr '\n' ','` The above script translates all the newlines (\n) to commas (,) in the 'filter-skiplist' file and passes the result to the 'filter-media' -s (skip) flag. So, in the end, filter-media receives a comma-separated list of handles of PDFs which it should no longer process. (Obviously this means any PDFs belonging to items in your 'filter-skiplist' can not be full text searched in DSpace) I'm hoping that in the longer term PDFBox will resolve its memory issues as it comes out of the incubation stage under Apache. If anyone else has potential solutions, I'd love to hear them, as I'm in a similar situation as Jeffrey. - Tim Jeffrey Trimble wrote: I've run into a funky situation. After using the distributed PDFBOXand the associated jars (bouncy castle) the filter media works really, really well, until-- We have one pdf that has caused the filter-media to produce a memory dump/ java heap dump. The errors are reports first the IBM flavor of JVM. We removed the offending PDF from the database, the filter-media went on it's way merrily. Has anyone seen anything like this? I have a copy of the heap dump and trace. I can reproduce it one demand by placing this PDF back into the IR. If you have seen this, and was able to resolve it, please let me know. The only thing I can think of doing is to rescan the PDF file from the original and seeing if there is something that resovles itself with the new scan. Thanks in advance, Jeffrey Trimble System LIbrarian William F. Maag Library Youngstown State University 330.941.2483 (Office) jtrim...@cc.ysu.edu mailto:jtrim...@cc.ysu.edu http://www.maag.ysu.edu http://digital.maag.ysu.edu -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Tim Donohue Research Programmer, IDEALS http://www.ideals.uiuc.edu/ University of Illinois tdono...@illinois.edu | (217) 333-4648 -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___
Re: [Dspace-tech] Help! Advise on setting up History subsystem on V1.5.1
Hi Gabriele: Because of defects in the initial implementation of the history system in DSpace 1.0.*-1.4.*, it was removed for 'repair', and is not available in 1.5.1. Some of this work has been done here at MIT, and we hope to reintroduce it to the codebase, but nothing is 'drop-in' ready at the moment. If you are interested, we would be glad to share further details with you. Sorry, Richard Rodgers Franzini, Gabriele [Nervianoms] wrote: Hello, We are exploring DSpace functionalities, and being in a regulated environment we absolutely need to look first at the History (Audit Trail) features of the system. We have installed version 1.5.1. As specific setup for the History part, we have just configured the history.dir in dspace.cfg. For now, it seems that my edits are not recorded at all. There are no files at all in the History dir. Can somebody point us to configuration steps for the History system in 1.5.1? Many thanks in advance, Gabriele Franzini ICT Applications Manager Nerviano Medical Sciences SRL PO Box 11 - Viale Pasteur 10 20014 Nerviano Italy tel +39 0331581477 fax +39 0331581456 -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Reusing bitstream sequence number
Hi Mark: That's fine - any part of DSpace is fair game for debate. I just wanted to inform the discussion that the current design is based on a careful, reasonable analysis, and that there may be hidden costs in alternatives. I do worry about opening door #1 [content rejection], since taking assets as found seems pretty close to the bedrock use-case for digital repositories - at least preservation-minded ones. Or to put it more provocatively: DSpace could keep its hands clean (and its URLs pretty), but only by pushing the problem back on content providers, who would be left with what you characterize as the truly awful dirty work of ensuring unique filenames. Food for thought, Richard On Sat, 2008-08-16 at 17:39 -0700, Mark Diggory wrote: Richard, I respectfully disagree with you. On Aug 16, 2008, at 6:54 AM, Richard Rodgers wrote: Hi Mark: Let me explain the problem more fully, which is a very simple 'inconvenient truth' about assets: some complex digital objects we we want to submit as one Item have filename duplications. E.g. in directory 'q4' we have 'report.doc', but the same filename in directory 'fy08' with different content. In the face of this, we can: (1) reject the content (duplicate filenames detected! - please correct or resubmit as multiple items), which is unacceptable. Is it really that unacceptable?! I disagree, what use are two files with the same identical name in a DSpace Item? IMHO, it creates ambiguity in an area file name where users expect conformity with conventions. Really, which file would I choose to download if they had the same identical name? On top of this, what would I do with the second file when the OS/Browser asked me if I wanted to replace the first one I just downloaded, I suppose I'd have to rename it to arrive back at a state of being able to tell the two apart? No, instead we should be adopting RESTfull practices here, allowing DSpace to adhere to more conventional expectations. http://en.wikipedia.org/wiki/Representational_State_Transfer#RESTful_example:_the_World_Wide_Web Here, if DSpace were to take on REST'full practices in its URI conventions, we would be able to do things like versioning and predictable resource naming. For instance, in your example. PUT /bitstream/handle/1234.5/67890/q4/report.doc HTTP/1.1 PUT /bitstream/handle/1234.5/67890/fy0/report.doc HTTP/1.1 Would clearly result in two different bitstreams, whereas if I did do PUT /bitstream/handle/1234.5/67890/report.doc HTTP/1.1 PUT /bitstream/handle/1234.5/67890/report.doc HTTP/1.1 The second would be overwriting the first. Also a legitimate behavior allowing me to replace/version the resource (for which if I chose to expose access to might look like the following)... GET /bitstream/handle/1234.5/67890/report.doc?revision=1 HTTP/1.1 and GET /bitstream/handle/1234.5/67890/report.doc?revision=0 HTTP/1.1 Likewise, we find this relative directory structure convention maintained in many other Internet resource related areas... in fact this is how the SIP METS and OCW IMSCP packaging works based on basic zip files and manifests. But, yet again the DSpace solution breaks the convention in this case. Take a METS/SIP package representing the following... package.zip$mets.xml package.zip$q4/report.doc package.zip$fy08/report.doc In current dspace parlance... might in turn result in... http://host/bitstream/handle/1234.5/67890/1/mets.xml http://host/bitstream/handle/1234.5/67890/2/q4/report.doc http://host/bitstream/handle/1234.5/67890/3/fy08/report.doc And now where the original relative references in the mets.xml were proper in relation to the files in the zip, they are now NOT when looking that the resultant URL's in DSpace. Now, thats what I call an inconvenient PITA. And it comes up here with Johns issue, it came up in my DDI/VDC work, it came up again in Carl Jones work with the RVC/Stellar support and it was happening again with our attempting to predict the location of GIS files in a DSpace Items for the Dome GIS Lab interoperability work. Not good. Finally, on the Dissemination naming side, this breaks yet again. If I were instead to have the following item in DSpace: http://host/bitstream/handle/1234.5/67890/1/mets.xml http://host/bitstream/handle/1234.5/67890/2/report.doc http://host/bitstream/handle/1234.5/67890/3/report.doc I can't now use the file names to represent the files in the METS DIP. How can I have two different Zip Entries with the same file name? package.zip$mets.xml package.zip$report.doc package.zip$report.doc Just doesn't expand without one of the files getting overwritten. No, this is a serious problem in the original design that is causing users/developers who expect conventional behavior and can't get it out of DSpace. (2) accept the content, but transform or rewrite
Re: [Dspace-tech] Reusing bitstream sequence number
On Mon, 2008-08-18 at 19:23 +0100, Graham Triggs wrote: Richard Rodgers wrote: I do worry about opening door #1 [content rejection], since taking assets as found seems pretty close to the bedrock use-case for digital repositories - at least preservation-minded ones. Well, that is an interesting argument! Now, if we look at assets 'as found' then they will [probably] be located in a users file system. That file system will already be enforcing a unique constraint on the names of files within a directory. True - ingestion into a repository is a destructive decontextualization/recontextualization. My point about 'as found' means be as minimally destructive as possible, viz not changing 'graham.txt' to 'richard.txt' (or 'graham[2].txt'). If fact (the web submission UI notwithstanding), nothing in the data model prevents us from capturing all we want as contextual metadata (hasParent 'q4' etc). Now, in your example you had a user with two files that had the same name but located in different directories. Presumably there is implicit knowledge in the particular organisation of the file structure. And we are not taking it 'as found' because DSpace is forcing the user to throw away that organisation (and therefore any knowledge/information it implies) when attaching all those files to a single item. In terms of the sequence number, we assign a 'genuine unique id' to every bitstream that is ingested, and there is no reason why that id can't be used in place of the sequence number in the url in the case where disambiguation is necessary. There is nothing wrong with presenting a disambiguation page if a url is provided without that unique id, and where the filename can't uniquely resolve. There just isn't any need to use a sequence number in this way, and include it as part of the URL. What there is a need for is a way to define the order in which the bitstreams are presented for an item - which should be the job of a sequence number, but it isn't used for that. (Note that the above is true for the majority using the 80/20 rule. There may be some exceptional cases that doesn't fit into the above statements, but then they may not be serviced sufficiently by the existing use of sequence numbers either). G This email has been scanned by Postini. For more information please visit http://www.postini.com - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Reusing bitstream sequence number
Hi Mark: Let me explain the problem more fully, which is a very simple 'inconvenient truth' about assets: some complex digital objects we we want to submit as one Item have filename duplications. E.g. in directory 'q4' we have 'report.doc', but the same filename in directory 'fy08' with different content. In the face of this, we can: (1) reject the content (duplicate filenames detected! - please correct or resubmit as multiple items), which is unacceptable. (2) accept the content, but transform or rewrite into unique filenames (q4-report.doc? report[2].doc?, etc?), which is almost as bad, since we now have both obscured the original name, and altered what we are supposed to be preserving. or (3) [what DSpace currently does] store the filename as *metadata*, which, like file size, can be valuable, but which may not be unique, and use a different identification system that *guarantees* uniqueness within the item (sequence id). I think because it's a number, the sequence ID is easily confused with a version, which it is not. And in fact, there is nothing sacred about sequence numbers as a technique either: we also considered MD5 checksums, timestamps, (maybe now uuids, etc); sequence numbers won because the URLs were shorter and easier to use. The choice of ID schemes does have consequences, as some of John P.'s use-cases illustrate: a 'slot number' (which can be reassigned) is different from a 'sequence number' (which can't), and we can debate the comparative merits of each (or others): my point was that filename is an apparent non-starter (for reasons above). As to the 'heuristic' URLs in 1.5 Manakin, I regard them as closer to a bug than a solution. Just as we would never use an online bank that looked up our account files by taking the first match for our last names, so I think we should not accept indeterminate semantics in bitstream retrieval (I wanted 'fy08', but got 'q4') - that's what unique IDs are for. My 2 cents, Richard Mark Diggory wrote: On Aug 15, 2008, at 12:15 PM, John Preston wrote: On Fri, Aug 15, 2008 at 1:40 PM, Richard Rodgers [EMAIL PROTECTED] wrote: On Fri, 2008-08-15 at 10:12 -0700, Mark Diggory wrote: On Aug 15, 2008, at 9:36 AM, John Preston wrote: Hi. Can anyone say how I can re-use a bitstream sequence number. The use case is the following On Aug 15, 2008, at 10:01 AM, Mark H. Wood wrote: Allowed or not, this sounds risky. If you are overloading the sequence number with a new meaning, this practice is likely to bite you again and again, since the developing stock code won't recognize your second meaning and will take no pains to preserve it Mark is correct about overloading the semantics here. Note, We adjusted the behavior behind the dspace 1.5 XMLUI (but not the JSPUI) to allow for unsequenced name resolution of the bitstreams. For instance: ... It certainly would have been much easier to key Bitstreams on the name rather than a sequence id in the original architecture. I've seen requests such as yours numerous times during my history of working on DSpace and being able to reference resources by simple assignable predictable names rather than internally generated sequence ids makes life on the outside of DSpace easier and 3rd party tooling more powerful. This is something I hope to take into the 2.0 development initiative. Easier perhaps, but unfortunately the Bitstream filename need not be unique, so is a problematic candidate for a durable reference. Richard, that is the crux of my criticism. It would be easier and more useful all around if the name were part of the identifier/re- visioning strategy for the item in DSpace 2.0 using the name as the identifier for the bitstream within the scope of that Item and its item wide revision id, the current XMLUI support is a transition somewhere between the original DSpace behavior and this Item re- visioning end-goal of 2.0. Likewise, Johns case is yet another example of why we need the ability to assign such identifiers rather than have them assigned internally. And because John seeks to supply an updated version of the file with the requirement that he not have to remove all the bitstreams and recreate them in order reconstruct all the local references to that specific bitstream within his item, its a reasonable use case. I encountered this when creating the DDI metadata (relative URI) describing the data files I ported from the Virtual Data Center to DSpace. http://dspace.mit.edu/handle/1721.1/39118 Where I might have: http://dspace.mit.edu/bitstream/handle/1721.1/39126/1/study.xml How would I define my DDI's relative references to the other bitstreams prior to having ingested the entire package representing the Item into DSpace, when my external application doesn't have access to this internally generated sequence id until after the fact? (thats
Re: [Dspace-tech] Reusing bitstream sequence number
On Fri, 2008-08-15 at 10:12 -0700, Mark Diggory wrote: On Aug 15, 2008, at 9:36 AM, John Preston wrote: Hi. Can anyone say how I can re-use a bitstream sequence number. The use case is the following. I have a item with a number of bitstreams which are my data files. I also have a text file bitstream which contains the url to the data file bitstreams. Now, if I update one of these data files by deleting the old bitstream and adding the new data file bitstream, the name remains the same but the sequence number for the updated bitstream is different from the original data file bitstream. I want to be able to add the updated data file bitstream with the same sequence number as the original one. Is this allowed, or do I have to hack it. John On Aug 15, 2008, at 10:01 AM, Mark H. Wood wrote: Allowed or not, this sounds risky. If you are overloading the sequence number with a new meaning, this practice is likely to bite you again and again, since the developing stock code won't recognize your second meaning and will take no pains to preserve it. What is it that you need to accomplish? Mark is correct about overloading the semantics here. Note, We adjusted the behavior behind the dspace 1.5 XMLUI (but not the JSPUI) to allow for unsequenced name resolution of the bitstreams. For instance: http://dspace.mit.edu/bitstream/handle/1721.1/39126/ womenpolicymakers_census_dta.tab http://dspace.mit.edu/bitstream/handle/1721.1/39126/ womenpolicymakers_census_dta.tab?sequence=3 http://dspace.mit.edu/bitstream/handle/1721.1/39126/3/ womenpolicymakers_census_dta.tab Are now all valid references the bitstream at this location. In the case where the sequence number is absent, the first bitstream encountered in the Item with that name is returned. It certainly would have been much easier to key Bitstreams on the name rather than a sequence id in the original architecture. I've seen requests such as yours numerous times during my history of working on DSpace and being able to reference resources by simple assignable predictable names rather than internally generated sequence ids makes life on the outside of DSpace easier and 3rd party tooling more powerful. This is something I hope to take into the 2.0 development initiative. Easier perhaps, but unfortunately the Bitstream filename need not be unique, so is a problematic candidate for a durable reference. Richard Cheers, Mark - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Reusing bitstream sequence number
Hi John: Good question, although I think it may be premature to answer it. As Mark mentioned, data model enrichment is one of the centerpieces of the 2.0 work, and the versioning code that was done predates this: so the two will have to be reconciled. Stay tuned.(I really mean, jump in and lend your expertise when this gets thrashed out). Thanks, Richard On Fri, 2008-08-15 at 14:15 -0500, John Preston wrote: How will the versioning scheme, that I recall being talked about some time ago, work. Did it not need to keep a stable reference to a bitstream along with versions John On Fri, Aug 15, 2008 at 1:40 PM, Richard Rodgers [EMAIL PROTECTED] wrote: On Fri, 2008-08-15 at 10:12 -0700, Mark Diggory wrote: On Aug 15, 2008, at 9:36 AM, John Preston wrote: Hi. Can anyone say how I can re-use a bitstream sequence number. The use case is the following. I have a item with a number of bitstreams which are my data files. I also have a text file bitstream which contains the url to the data file bitstreams. Now, if I update one of these data files by deleting the old bitstream and adding the new data file bitstream, the name remains the same but the sequence number for the updated bitstream is different from the original data file bitstream. I want to be able to add the updated data file bitstream with the same sequence number as the original one. Is this allowed, or do I have to hack it. John On Aug 15, 2008, at 10:01 AM, Mark H. Wood wrote: Allowed or not, this sounds risky. If you are overloading the sequence number with a new meaning, this practice is likely to bite you again and again, since the developing stock code won't recognize your second meaning and will take no pains to preserve it. What is it that you need to accomplish? Mark is correct about overloading the semantics here. Note, We adjusted the behavior behind the dspace 1.5 XMLUI (but not the JSPUI) to allow for unsequenced name resolution of the bitstreams. For instance: http://dspace.mit.edu/bitstream/handle/1721.1/39126/ womenpolicymakers_census_dta.tab http://dspace.mit.edu/bitstream/handle/1721.1/39126/ womenpolicymakers_census_dta.tab?sequence=3 http://dspace.mit.edu/bitstream/handle/1721.1/39126/3/ womenpolicymakers_census_dta.tab Are now all valid references the bitstream at this location. In the case where the sequence number is absent, the first bitstream encountered in the Item with that name is returned. It certainly would have been much easier to key Bitstreams on the name rather than a sequence id in the original architecture. I've seen requests such as yours numerous times during my history of working on DSpace and being able to reference resources by simple assignable predictable names rather than internally generated sequence ids makes life on the outside of DSpace easier and 3rd party tooling more powerful. This is something I hope to take into the 2.0 development initiative. Easier perhaps, but unfortunately the Bitstream filename need not be unique, so is a problematic candidate for a durable reference. Richard Cheers, Mark - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.Net email is sponsored by the Moblin Your Move
Re: [Dspace-tech] ItemImport
Hi Jose: Looks like the doc is a little behind the code - you might have noticed the thread where we are trying to rationalize the documentation process. For now, the ItemImporter code is your best bet. But yes, the Bitstream description can be added as you suggest, but note that the '\t' really refers to a tab separation in the import file, not the literal token '\t'. Hope this helps, Richard On Fri, 2008-08-08 at 10:54 -0400, Blanco, Jose wrote: I remember seeing that in 1.5 when using the item importer you can pass in a file description, and perhaps even permission info, but I can't find the documentation. From looking at the code, it seems that to put in a file description, the following must be added to the line listing the file: \tdescription: Your description. Is this right? Is there documentation on this? Thanks! Jose - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Fwd: New event system in 1.5
John: I guess before making any firm recommendation, I'd need to know what your requirements are. Do you, e.g., require transactional closure over the content change that generates the first event and the additions you want your consumer to make based on it? Can you describe what you are trying to accomplish? As noted, both options 2 3 cause your code to fork from the distribution, creating maintenance issues in the future. If you don't need closure, why not just have your consumer add to a list of updates to be made? This is one of the intended uses of the event system when it's hooked up to a message queue, e.g. Thanks, Richard Quoting John Preston [EMAIL PROTECTED]: Sorry to bother you again, but quick question. When does the end() method get called. Is it after the events arraylist has completed iterations of all events. if so then I could move all my processing of the bundles, bitstreams and items to this method. The only other thing I would have to guard against would be more events of the same type, subject and object with the same transaction ID. This may be a wise thing to try, yes it is done after consume has completed on all the events, the dispatcher is done with the list of events at that point. most of the consumers actually do complete their processing at this point too. But be careful of what I stated above, you need to make sure this list of Events is coming from the correct Context. On Wed, Jun 18, 2008 at 9:33 PM, John Preston [EMAIL PROTECTED] wrote: Ok, I think I understand what is happening. When I create or remove bundles, or bitstreams or any of these objects that generate events, these add new events which cause a concurrent modification conflict with the events arraylist, that is being used to dispatch the events in the first place. So having anew context will not help me. I need to bypass adding new events for any modifications. I think I have three choices, and your thoughts would be welcome on them. 1) Keep my code as mediafiletsr run periodically, and wait for 2.0; 2) extend the code for Bundles, Items, Bitstreams, etc, anything that issues events when the update() method is called, and add another update(boolean issueEvent) merthod that performs the database update with or without issuing any events, depending on the parameter passed; 3) implement my own event system; I'm leaning towards 2 or 3, but I don't know how many DSpace objects issue events. I'm playimg with a GWT front end application to some RPC services that actually interact with the dspace objects via the dspace-api codebase. It has gone pretty quickly up till this point so I'm looking for the quickest long lasting method. I'd avoid (3). I think (2) will be coming in the DSpace 2.0 next year sometime (we are still at the drawing board on this so its very early). If you take this route and it isn't compatible with what gets done in our development, it may be difficult to integrate. (1) Is your safest bet, but I understand why you want to get away from it. If you would, please forward a summary of this to the dspace-tech list as a reply so that we have a complete thread there. Cheers, Mark John On Wed, Jun 18, 2008 at 7:26 PM, John Preston [EMAIL PROTECTED] wrote: We do not recommend using a Event Consumer to directly manipulate the Item in question by adding Bitstreams or altering its metadata because not only is that available Contexts transactional window is closed by the time you've reached processing the Event in the consumer, but its difficult to predict the behavior or risk of getting the Event System in the state of of an infinite event loop because your alterations will in turn, also alter the Item and generate new events in the process. Yes, I saw that, that's why I started to use context().getDBConnection.commit() to avoid infinite event loops. Unfortunately, this is the sort of task most folks were hoping the Event System would solve in DSpace 1.5. I feel we are attempting to address its shortcomings in the DSpace 2.0 roadmap to come up with a solution. That would be great. If you do decide to attempt to use the Event System in such a manner as to alter the Object in question and update it again, I would recommend using a new dspace Context and make sure you can trap the events in some sort of conditional check to make sure they should be propagated to through your Consumer again. I tried this but I still get the error. I did the following: Context c = new Context(); ctx.setCurrentUser(context.getCurrentUser()); ctx.setCurrentLocale(context.getCurrentLocale()); I'll look again at the code in the morning to see if I can understand where the problem lies. Thanks. John Sincerely, Mark On Jun 18, 2008, at 4:31 PM, John Preston wrote: Can someone say whether it is advisable or not to use the new event system under 1.5 to add metadata or bitstreams to an item. I am moving
Re: [Dspace-tech] Fwd: New event system in 1.5
Hi John: See below Richard R On Thu, 2008-06-19 at 11:13 -0500, John Preston wrote: I guess before making any firm recommendation, I'd need to know what your requirements are. Do you, e.g., require transactional closure over the content change that generates the first event and the additions you want your consumer to make based on it? No. Can you describe what you are trying to accomplish? As noted, both options 2 3 cause your code to fork from the distribution, creating maintenance issues in the future. Yeah I know. I have a number of different requirements for consumers, ranging from simple notification of item, bitstream or metadata changes, to item, bitstream or metadata additions. I'll try and summarize a few of the cases below. 1) When and item is added, I wish to include bitstreams in the item in particular bundles. These could be thumbnail images, or bitstreams derived from processing of already available bitstreams. For example, if I add spatial files to an item I wish to be able to be able to add various spatial indexes to specific bundles, as well as thumbnails. So can I infer that in this use-case, the latency imposed by batching these operations in media filter is intolerable/undesirable? (You could run it quite frequently if needed). I'm generally cautious about entangling the generation of derivatives and the addition of the primary artifact. You want the latter always to succeed, since the former can always be regenerated. There also can be issues of response-time to worry about, depending on the derivative(s). These are a few of the motivations behind the Media-Filter architecture, rather than having synchronous creation of derivatives, which seems to be your objective here. 2) When item, metadata or bitstreams are changed (modified or removed), I wish to update various metadata fields to ensure that they are consistent with the data held in the item. Consistency constraints form a very interesting class of use-cases. Are these changes coming though the admin UI, or elsewhere, or anywhere? Do you want a consistency check to be performed on any change, or only certain ones, etc? 3) When item, metadata or bitstream is added, I wish to update the configuration of other co-operating applications to include the changes made to the DSpace base data. For example, if I add a spatial file to an item, then I want to add this item to a WMS server so that it is also available to spatial clients. This is in the sweet-spot of the event mech - you could have a very simple consumer that does this notification following the creation events that stem from use-case 1. So I guess the short answer would be to write a filter for 1 2, and a consumer for 3. This would keep you entirely in the codebase. If you need a tighter coupling than the media filter can provide, we have talked about providing an in-process asynchronous message queue consumer (we already have a prototype enterprise JMS queue, if you would want to look at that, but it might be over-kill). Let me know if you are interested in this alternative. As to your current code, I can see how this will work, and it's fine in the interim, but I don't think in future releases we will want to expose bare DB connections (getDBConnection()), since this breaks some encapsulations we are striving for in our DAO and other work. As I mentioned earlier, I have now got my consumer working without the earlier errors usiing a new context in the following way: Context ctx = new Context(); ctx.setCurrentUser(originalContext.getCurrentUser()); ctx.setCurrentLocale(originalContext.getCurrentLocale()); // If I have to create a new bundle tbundles = new Bundle[]{item.createBundle(THUMBNAIL_BUNDLE)}; // Commit changes to datastore tbundles[0].update(); item.update(); // Add bundle and item objects to the original context originalContext.cache(tbundles[0], tbundles[0].getID()); originalContext.cache(item, item.getID()); // commit changes to datastore bypassing any event triggered during use ctx.getDBConnection().commit(); // Clear temporary context so it will be garbage collected ctx.clearCache(); By using the temporary context and NOT calling the ctx.getDBConnection().commit() method I save the changes to the database while bypassing any events triggered during its use. I found that I had to add any of the DSpace objects that I used with the temporary context to the origonal context so they would be found by my code later on. I also cleared the cache of the temporary so it will be properly garbage collected. I'll see how long this lasts. On question. Is there some early info at this stage of how the event system for 2.0 is shaping up to be. Not that I know of. John If you don't need closure, why not just have your
Re: [Dspace-tech] Lots of sites pursuing streaming hacks -- how to coordinate?
Thanks Mark - also if you eventually also want a code integration site, we can set something up at dspace-sandbox on GoogleCode fairly easily... Richard R Quoting Mark H. Wood [EMAIL PROTECTED]: I've made a page on the wiki to collect ideas about A/V material: http://wiki.dspace.org/index.php/The_Challenge_of_Audio_and_Video Please add to it! -- Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] Typically when a software vendor says that a product is intuitive he means the exact opposite. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] About DSpace item registration (from SRB)
Hi Feng-chien: With regard to your first question, there is no support in 1.4.2 for automatic replication or backup to a secondary store. This is certainly a desirable feature, and it is 'on the radar' for future storage development. With regard to the second question, the 'ItemImport' batch ingestion tool does support 'registration' using a command line switch (-r). Your best option would be to review the source code before running it, but also feel free to post any specific questions to this list. Hope this helps, Richard R On Wed, 2008-03-26 at 16:13 +0800, Feng-chien Chung wrote: Dear DSpace community: I am Feng-chien, computing center of Academia Sinica, Taiwan. We are now running our DSpace in version 1.4.2 . Recently, we've just set up a SRB connection for DSpace asset storage supplement, keeping the original assets in local hard disk and make the new incoming assets go to SRB. First I am wondering, is it possible to make SRB purely for backup purpose? I mean, while achiving a new incoming item in the local file system, save a replica in SRB at the same time? And Second, we are going to try the registration function to ingest items which are already storaged in SRB. But After checking out the DSpace Wiki DspaceSrbIntegration page (http://wiki.dspace.org/index.php/DspaceSrbIntegration) and UCSD's work on DSpace-SRB Integration project (https://libnet.ucsd.edu/nara/), I didn't find much informatin about the process, configuration...etc., So has anyone had experience with registering items from SRB into DSpace, or where can I get more information about this? Thanks for help~ Feng-chien Chung Metadata Architecture and application team, Computing Center of Academia Sinica, Taiwan e-mail: [EMAIL PROTECTED] - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] out of memory error - statistics and community pages
On Tue, 2008-03-18 at 13:14 +, Simon Brown wrote: On 13 Mar 2008, at 21:34, Richard Rodgers wrote: On Thu, 2008-03-13 at 16:23 +, Simon Brown wrote: I'm still curious about the necessity of the cache, as our removing it had no noticeable impact on performance and in fact increased the responsiveness of the site when we did before-and-after tests with Siege. Most of these questions/observations stem from your assumption that the cache is there for performance, which it is not (except incidentally). The primary purpose is (roughly) transactional integrity - ensuring that there is only a single copy of an Item, etc, in play before the Context commits. In this light, it makes sense that there might be a slight performance penalty. So there are instances where, in the course of a single HTTP request, multiple instances of the same Item may (in the absence of the cache) be instantiated, modified, and committed to the database? Could you give me an example of when this could happen? Clearly, as we're currently running without the cache, we're in danger of having this happen, and I'd like to be able to evaluate this risk. Not that I know of, but I'm not sure that is the only question to ask in evaluating risk. The cache has to do with guaranteeing the safety of the DSpace programming API, which at its most basic is: (1) get a context (2) do some work (3) commit all work as an atomic transaction (4) free the context Note that the work need not be confined to a single HTTP request (that only came up because we were discussing spidering, where that happens to be true) - a context can have an arbitrarily long life, and involve an unlimited number of database reads, updates, etc. Thus it would be easy to write application code like: Read object A (as part of iterating through a collection) Modify it, update database a lot of other operations Read Object A (in another iteration) Modify it, update database ... other operations commit Without the cache, the first set of modifications would be lost. Now we certainly could guard against this by vetting all the application logic looking for problems, but the cache provides cheap (but not free, as your Siege profiling shows) insurance against it. From a risk mitigation standpoint, I'd say as long as you have a very stable and well-understood system, risk should be fairly low - note, however, that I haven't done an exhaustive analysis. But DSpace is moving into a more modular world in which non-core (= independently developed) code will constitute an increasingly part of its functionality. In such a world API safeguards like the cache look increasingly good, in the sense of justifying their performance price. You raise some very interesting questions, and I don't want to convey the impression that the DSpace architecture is 'fully baked' in this area: one suggestion with merit I've heard proposed (by Rob Tansley) is to segregate Contexts into 'read-only' and 'writable'. The former could then utilize a shared cache much like the one that you first imagined the context cache was. I think you will see continued work on this as we move to 2.0. From this I'm also assuming that Browse.indexAll() won't do anything to the database until the context commits after the call is done, which for big repositories would be another way for this method to use up a fair amount of heap. Correct. The browse system underwent a number of changes moving from 1.4 through 1.5, and I'm not conversant with it now, but in 1.4, I see no cache flushing. If you are running into difficulty, insert commit()s and item.decache()s to get a lower-profile heap. Thanks, Richard R -- Simon Brown [EMAIL PROTECTED] - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] out of memory error - statistics and community pages
Hi Simon: While I don't doubt for a moment that there are undiscovered memory leaks in DSpace, I'm not sure I follow the case you describe. By 'object cache' I'm guessing you mean the cache that is held by the Context object. This cache is private to the Context instance, and Contexts as a rule don't live very long (typically a single HTTP request), so I don't see how spidering activity could accumulate objects in it. There are other cases - like ItemImport or MediaFilter runs - that use a single context instance (therefore cache) and might iterate over the whole repository, and *could* suffer from what you describe, but as of 1.4 at least, these apps were all recoded to flush their caches. Hope this helps, Richard R On Thu, 2008-03-13 at 13:03 +, Simon Brown wrote: Another thing you should consider is removing the object cache. In 1.4.2 the object cache will cache any Item requested but will not automatically flush any of them. If you have a large installation and it's being spidered by search engines, eventually the heap will fill with cached objects and, again, your server will go over. It's by no means the only leak - we don't have the cache and our site still fails within a day and a half - but it's a real one. -- Simon Brown [EMAIL PROTECTED] - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] out of memory error - statistics and community pages
Hi Simon: See remarks below... Thanks, Richard R On Thu, 2008-03-13 at 16:23 +, Simon Brown wrote: On 13 Mar 2008, at 15:51, Richard Rodgers wrote: Hi Simon: While I don't doubt for a moment that there are undiscovered memory leaks in DSpace, I'm not sure I follow the case you describe. By 'object cache' I'm guessing you mean the cache that is held by the Context object. This cache is private to the Context instance, and Contexts as a rule don't live very long (typically a single HTTP request), so I don't see how spidering activity could accumulate objects in it. Fair enough, I guess I didn't correctly understand how the Context object is used. I had assumed that it would be shared across multiple requests, largely because of the existence of the cache. So what happens is, whenever an individual http request accesses an Item, that Item is loaded into the HashMap in the Context, then discarded when the request is completed? Is it the case that an individual Item object is often requested from the database multiple times in the course of a single HTTP request? I'm still curious about the necessity of the cache, as our removing it had no noticeable impact on performance and in fact increased the responsiveness of the site when we did before-and-after tests with Siege. Most of these questions/observations stem from your assumption that the cache is there for performance, which it is not (except incidentally). The primary purpose is (roughly) transactional integrity - ensuring that there is only a single copy of an Item, etc, in play before the Context commits. In this light, it makes sense that there might be a slight performance penalty. There are other cases - like ItemImport or MediaFilter runs - that use a single context instance (therefore cache) and might iterate over the whole repository, and *could* suffer from what you describe, but as of 1.4 at least, these apps were all recoded to flush their caches. Browse.indexAll() and DSIndexer.indexAllItems(), on the other hand, don't seem to flush cache. I appreciate that it's not an often-used case, but it would mean that broken indexes on large databases will probably fail to rebuild due to the cache filling up the heap. Well observed - these are fixed in current DSpace trunk I believe, and a 1.4.X patch '#1659841 Add option to clear context object cache' is available. -- Simon Brown [EMAIL PROTECTED] - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Creative Commons Enigma
Hi Maike: I have not tested this, but if you consult the documentation at: http://wiki.creativecommons.org/Web_Integration_Guide it suggests that if you add a query parameter 'jurisdiction' (the doc calls them 'URL variables') to the Creative Commons URL requested in the submit/creative_commons.jsp, you can force the Canadian licenses to appear. If this works, document it for the benefit of others. Thanks, Richard Quoting Maike Dulk [EMAIL PROTECTED]: Hi Richard, gosh, that is indeed entirely new, not only to me but also to the librarians who have used DSpace for a longer time now as well. This is indeed confusing. So thank you very much for clarifying this! The only thing that I would like to know is how I can change the target of that iFrame, which is now the 3.0 generic license, to the CC license that we want to use, i.e. the Canadian 2.5 version .. and I cannot find any setting / file where that is set. Do you know that? cheers maike On 9-Jan-08, at 6:23 AM, Richard Rodgers wrote: Hi Maike: A few explanations to help unravel the enigma: First, you should understand that there are two different licenses involved here, not a choice of one. The first - the deposit license - is (roughly) a licence that the depositor grants to the repository. It is not optional, and does not display to users in the item display. For many institutions, it is a legal requirement. It is a standard 'click-thru' license in the sense that the grantor can only accept or reject it (as you note). The license.default you mention contains this license text. The second - CC license - is (roughly) a license that an author grants to consumers of her work. It is entirely optional (as you note with the cc-enable property), and *does* prominently display to users in the item page. Here, the submitter *can* choose from a set of licenses that reflect the desire to control/share content. What is confusing is that they both appear in the submission work-flow (first CC, then deposit), and the the CC is a mini-workflow in itself consisting if steps in an Iframe. Many have lamented the 'brittle' Iframe, but it was the only programmatic access we had when it was first offered. Now there are 'web'service' interfaces to CC that should probably be adopted (volunteers welcome!). Hope this helps, Richard Quoting Maike Dulk [EMAIL PROTECTED]: Hi I noticed some really strange behaviour in our DSpace installation. It is the Creative Commons licensing. When the setting in dspace.cfg is set to true Creative Commons settings ## # are Creative Commons licenses used in submission? webui.submit.enable-cc = true the Creative Commons section of the submission process has TWO separate steps. The first one is the one that has the iFrame that displays the Creative Commons webpage in it, and there are two choices that the submitter can set before accepting the license. After that, there is another step that asks another time whether the submitter wants to accept the Creative Commons license. This page is much simpler and has no choices, only two buttons - either I Grant the License or I Do Not Grant the License . The strange thing is that the first stage points to a version of the CC license that is NOT the one that we want to use, and that I have entered in the default.license config file. We want to use the Canadian 2.5 license, but it displays - AND attaches - the generic 3.0 CC license. That second stage does point to the Canadian 2.5 version license - but it is apparently discarded. The strangeness becomes more apparent when I do this: Creative Commons settings ## # are Creative Commons licenses used in submission? webui.submit.enable-cc = false In that case, there is no longer the first step (the iFrame one) - but the second step (the one with the url to the Canadian version and the I Grant the License or I Do Not Grant the License buttons) STILL comes up. But submissions do not have the CC license attached .. which does make some sense, since it is disabled. So I'd really like to know: - what makes this second step appear even though the CC is disabled in dspace.cfg? - How can I turn it off? - where can I make the first CC license point to the right CC license version (Canadian 2.5) ? It is not in the default.license .. since that is set to the Canadian one. - if we could get the second step to work with the right CC version that would be even better, since it is much simpler than that brittle iFrame mechanism My gosh, this is *complex*. thanks in advance, maike -- Maike Dulk - Programmer / Analyst McPherson Library, University of Victoria (t) 250-886-5709 / (e) [EMAIL PROTECTED] -- Harthon gerithach aeair vilui / I hope you will have kind seas - Check out the new SourceForge.net Marketplace. It's the best place to buy
Re: [Dspace-tech] Creative Commons Enigma
Hi Maike: A few explanations to help unravel the enigma: First, you should understand that there are two different licenses involved here, not a choice of one. The first - the deposit license - is (roughly) a licence that the depositor grants to the repository. It is not optional, and does not display to users in the item display. For many institutions, it is a legal requirement. It is a standard 'click-thru' license in the sense that the grantor can only accept or reject it (as you note). The license.default you mention contains this license text. The second - CC license - is (roughly) a license that an author grants to consumers of her work. It is entirely optional (as you note with the cc-enable property), and *does* prominently display to users in the item page. Here, the submitter *can* choose from a set of licenses that reflect the desire to control/share content. What is confusing is that they both appear in the submission work-flow (first CC, then deposit), and the the CC is a mini-workflow in itself consisting if steps in an Iframe. Many have lamented the 'brittle' Iframe, but it was the only programmatic access we had when it was first offered. Now there are 'web'service' interfaces to CC that should probably be adopted (volunteers welcome!). Hope this helps, Richard Quoting Maike Dulk [EMAIL PROTECTED]: Hi I noticed some really strange behaviour in our DSpace installation. It is the Creative Commons licensing. When the setting in dspace.cfg is set to true Creative Commons settings ## # are Creative Commons licenses used in submission? webui.submit.enable-cc = true the Creative Commons section of the submission process has TWO separate steps. The first one is the one that has the iFrame that displays the Creative Commons webpage in it, and there are two choices that the submitter can set before accepting the license. After that, there is another step that asks another time whether the submitter wants to accept the Creative Commons license. This page is much simpler and has no choices, only two buttons - either I Grant the License or I Do Not Grant the License . The strange thing is that the first stage points to a version of the CC license that is NOT the one that we want to use, and that I have entered in the default.license config file. We want to use the Canadian 2.5 license, but it displays - AND attaches - the generic 3.0 CC license. That second stage does point to the Canadian 2.5 version license - but it is apparently discarded. The strangeness becomes more apparent when I do this: Creative Commons settings ## # are Creative Commons licenses used in submission? webui.submit.enable-cc = false In that case, there is no longer the first step (the iFrame one) - but the second step (the one with the url to the Canadian version and the I Grant the License or I Do Not Grant the License buttons) STILL comes up. But submissions do not have the CC license attached .. which does make some sense, since it is disabled. So I'd really like to know: - what makes this second step appear even though the CC is disabled in dspace.cfg? - How can I turn it off? - where can I make the first CC license point to the right CC license version (Canadian 2.5) ? It is not in the default.license .. since that is set to the Canadian one. - if we could get the second step to work with the right CC version that would be even better, since it is much simpler than that brittle iFrame mechanism My gosh, this is *complex*. thanks in advance, maike -- Maike Dulk - Programmer / Analyst McPherson Library, University of Victoria (t) 250-886-5709 / (e) [EMAIL PROTECTED] -- Harthon gerithach aeair vilui / I hope you will have kind seas - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Status of Pluggable Storage Interface
Hi Mark: Yes, sort of - but all I really wanted to see was 1 or 2 eggs I hadn't laid myself, to make sure the abstraction was sufficiently flexible general. I think there is good awareness appreciation of the need for clean modular boundaries around services like storage, so you can be confident that this will be taken up: I should have emphasized this at the outset. Did it work OK for your application? Thanks, Richard On Fri, 2007-12-07 at 10:00 -0500, Mark H. Wood wrote: Oh, good. Chicken/egg problem: I've tinkered with an application of this, but sort of lost momentum while waiting to see if and when the work my code depends on will be taken up. - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Status of Pluggable Storage Interface
I replied to Ravi, but for benefit of the list - but I am working on this prototype/interface, and hope it will be released in DSpace very soon (1.6?). It's a prototype only in the sense that I want to see additional storage systems attempts to write to the interface to insure it's adequacy. I believe that several such attempts are underway (Sun's Honeycomb, e.g.). Anyone can contribute with questions, criticism, implementation code, etc. Thanks, Richard Rodgers Quoting Ravi S Sathish [EMAIL PROTECTED]: Hi all, I sent an email some days ago introducing myself and my team @ Nirvana; http://sourceforge.net/mailarchive/message.php?msg_name=11579.66918.qm%40web45104.mail.sp1.yahoo.com We are working on commercial version of SRB and are interested in Pluggable Storage Interface. http://wiki.dspace.org/index.php/PluggableStorage The wiki page says that pluggable storage is still a prototype. Could somebody please let me know who's working on it and how we could contribute for its efforts? Thanks Ravi Sathish Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Facetted / faster browsing
On Thu, 2007-11-29 at 08:14 -0500, John S. Erickson wrote: Richard Rodgers wrote: (1) There is a lot of metadata in DSpace (and a lot more to come) that is not related to user discovery (technical metadata, e.g) - this could live in a triple store - but would not benefit from it. In fact, a lot or record-based metadata is accessed much more efficiently in a RDBMS. 1. From an architectural standpoint, doesn't a triple store (in theory) make it fundamentally easier to deal with a diversity of metadata types, *especially* technical metadata --- which can vary not only between formats but even between instances of a given format, depending upon the applications that have modified the bitstreams? Yes absolutely, but what I was trying to question was the 'grand unification' assumption I think gets made implicitly or explicitly in these discussions: i.e. that there has to be a single way DSpace represents and manages all its metadata. Since RDF is so general/powerful, it always looks like the prohibitive favorite if framed in these terms. I picture a continuum - which ranges from completely 'dark' metadata living only in an AIP in the asset store (recoverable, of course) to highly visible discovery metadata - with copies in Lucene, a triple-store, Google caches, etc. and cases in between involving collection management. Where Longwell/RDF shines is the case where such heterogeneous metadata needs to be combined for a particular discovery purpose. Now as Christophe pointed out, the trick is to manage this spectrum without excess system complexity, and too many moving parts. 2. Regarding efficiency, are you referring to query (somehow harder to get what you want) or performance (triple-store implementations haven't benefited from 30+ years of refinement)? More query than absolute scalability - I agree that triple-store implementations scale to at least relational database levels (often because they are backed by relational DBs!). And performance depends on the typical contexts of use - which gets back to point I was making above about different functional uses of metadata. Just trying to provoke discussion ;) You succeeded ;) Richard John - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Facetted / faster browsing [was Development goals]
Hi Christophe: See remarks below on Dwell... Thanks, Richard On Fri, 2007-11-23 at 05:29 +0100, Christophe Dupriez wrote: Hi MacKenzie, Mark and Jim! Thanks for insisting on the idea of a client based interface! DWELL: I will explore Dwell further. I tried it with http://simile.mit.edu/longwell/demo/libraries/ but it is rather slow from here. That is a very old demo - Longwell's speed has improved. See http://dspace-test.mit.edu/dspace-longwell for a test server here at MIT using more recent code. Is the inventory of values for a given facet evaluated locally, in DSpace or in an intermediary server application? Dwell is a server application with an RDF triple-store backend (like DSpace's database, but in RDF) - the metadata is a copy of what is in DSpace - optimized for presentation in the Dwell UI. I understood Dwell is based on OAI-PMH but there is no Search request in OAI-PMH. Actually, Dwell is independent of how the metadata is obtained, so it does not rely on OAI-PMH. We have provided an OAI-PMH exporter as one way to feed Dwell. In 1.5, we are adding another way based on the event mechanism, and there is already a large library of SIMILE tools for turning a lot of metadata formats into the RDF Dwell expects. An extension has be defined for this: http://www.dlese.org/dds/services/oai2-0/odl_service_documentation.jsp but I suppose it is not part of DSpace (am I wrong?). OAI-PMH+Search(ODL) has similar capabilities than RSS and would ensure better metadata transmission. RSS: Mark+Jim advice opened my eyes on a simple fact: RSS standard(s) may be used to represent a DSpace search result set (if I add a RSS flow generation to DSpace search). The nice thing with RSS is the potential promise of subscription for searches where new records are regularly retrieved and highlighted. RSS clients are not completely aware of their potential for databases searches (and not only news feed) and could be improved to manage easily simple ad hoc searches and not only subscriptions to searches. Some of them have the three frames interface I wish for my users to browse DSpace results (like an e-mail management software). I made some experiments with RSSBandit (open-source: http://www.rssbandit.org/ ) and I think it is a possible way to go. Anybody digged in that direction? Christophe MacKenzie Smith a écrit : Hi Mark, I've been saying for some time that, nice as the DSpace user interface is in many respects, it is not and should not be the only way to plumb a DSpace archive. If it is (currently) difficult to get a particular search style put into DSpace, may I suggest trying a different approach. One could harvest metadata via the PMH responder, organize them any way one wishes, and search them in any desired way. I can't resist pointing out that this is exactly what DWell does -- the faceted browsing and search UI that is layered over DSpace via an OAI-PMH plugin for RDFized metadata. See http://simile.mit.edu/wiki/Dwell or Richard Rodger's presentation on same at http://www.aepic.it/conf/viewpaper.php?id=212print=1cf=11 I think this is an excellent approach to building better DSpace UIs, and just leaves us with the problem of the underlying data rigidity, which I hope we can address by relying more on RDF or other rich metadata that is stored in the assetstore alongside the content files. The current DSpace metadata tables are great for managing content, but suboptimal for discovering what's in the repository (assuming we can get better discovery metadata from outside the system, somehow). MacKenzie - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Reply-To Header missing - list misconfigured?
Folks: I'm currently administering the tech dev lists would gladly reconfigure if the preponderance of opinion is in favor. I'm by no means a mail admin, and was following the recommendations of the GNU mailman docs, which I reproduce here: reply_goes_to_list (general): Where are replies to list messages directed? Poster is strongly recommended for most mailing lists. This option controls what Mailman does to the Reply-To: header in messages flowing through this mailing list. When set to Poster, no Reply-To: header is added by Mailman, although if one is present in the original message, it is not stripped. Setting this value to either This list or Explicit address causes Mailman to insert a specific Reply-To: header in all messages, overriding the header in the original message if necessary (Explicit address inserts the value of reply_to_address). There are many reasons not to introduce or override the Reply-To: header. One is that some posters depend on their own Reply-To: settings to convey their valid return address. Another is that modifying Reply-To: makes it much more difficult to send private replies. See `Reply-To' Munging Considered Harmful for a general discussion of this issue. See Reply-To Munging Considered Useful for a dissenting opinion. Some mailing lists have restricted posting privileges, with a parallel list devoted to discussions. Examples are `patches' or `checkin' lists, where software changes are posted by a revision control system, but discussion about the changes occurs on a developers mailing list. To support these types of mailing lists, select Explicit address and set the Reply-To: address below to point to the parallel list. Where are replies to list messages directed? Poster is strongly recommended for most mailing lists. Poster This list Explicit address If anyone has further input, please share with the list. Thanks, Richard R On Mon, 2007-07-30 at 11:38 -0500, Dorothea Salo wrote: The result of a missing reply-to header is that you have to use the reply-all function of your mail client to answer back to the list, which seems unnatural. In most cases, answers seem to be sent in private mail as implied by the missing header. As of my perception, this makes the list *more* noisy as required and renders the list archive less useful. Let me explain. I agree. Useful responses are not publicly archived, which means they are not searchable, which means we get the same questions over and over -- not because there is no answer, but because there is no *public* answer. I would very much appreciate the reconfiguration of dspace-tech to reply-to-list instead of reply-to-sender. Dorothea - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Having A Sub-Community Collection appear in 2 Communities?
Hi Marcelo: The sub-community functionality was designed as a single-parent model. I'd need to study the code/DB schema to see what potential problems there may be, but one question springs to mind immediately: What behavior do you want when (one of) the parents is deleted? Normally DSpace deletes everything below a community (sub-communities, collections, etc) I assume you would want any sub-community that has more than one parent to *not* be deleted. Is this true with your modified code? Thanks, Richard R On Fri, 2007-07-20 at 22:45 +0100, [EMAIL PROTECTED] wrote: Hi! I have the some problem. I would like to see the same sub-community inside several Communities. This means that one sub-community should have more than one father. The CommunityFilliator class states that: first test - proposed child must currently be an orphan So, I've commented this check and i can set one sub-community with several fathers. Im pretty sure that this modification is harmless, but, if this lines were in the code, they were there for a reason! Is it dangerous this change? Could this bring any side-effects? Marcelo Quoting George Kozak [EMAIL PROTECTED]: Hi... We are having a problem with people wanting to see the same Sub-Community (and it's collections) appear in 2 different Collections. For instance, currently I have a Sub-Community called Albert Mann Library. It appears in our Community Cornell University Library, but the people at Mann Library would also like their Library appearing in the hierarchy of College of Agriculture and Life Sciences. Can this be done? *** George Kozak Coordinator Web Development and Management Digital Media Group 501 Olin Library Cornell University 607-255-8924 *** [EMAIL PROTECTED] - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Limit of pages
Hi Tiago: That limit was put in because the progress bar on the top starts to distort the page beyond 6 (since it has a section for each step). You can relax it by renumbering the steps in SubmitServlet (they are constants in the source file), but be prepared to grapple with the progress bar issues... I'm not sure about the configurable item submission (due in the next release), but I'm guessing that it will offer greater flexibility in this regard. Hope this helps, Richard R On Tue, 2007-07-03 at 16:15 -0300, Tiago Ferreira wrote: Hello, I had some problems when i tried to insert more than 6 pages on a submission form. Does anyone know about the limit of 6 pages on a form? If so, how can I go aound it? Thanks in advance! Tiago Ferreira - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] srb/s3/etc and lucene
Well if by whammy you mean a read access, yes. But my point was that the Lucene indexing is done (absent corruption) only once - the exploded text asset file is not needed for a Lucene lookup - it consults its own constructed index file. So the performance - i.e. routine use of the index for look-ups - is completely independent of the asset store. If there is a read performance problem with a given store back-end, that's surely a concern, but Lucene doesn't add any specially onerous overhead to it. Having said all this, it is true that 'index-alls' are run fairly cavalierly, and it it worth noting this dependency. Richard Quoting Mark Diggory [EMAIL PROTECTED]: On 5/4/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, I'm just wondering, in specific terms, if we use an object-based storage system as an assetstore rather than a filesystem, where the files that Lucene indexes actually sit. Its tricky, this is what FilterMedia is for, it actually extracts the text and places it as a bitstream in the assetstore. Lucene full text indexing is done against the assetstore bitstreams in all cases (well accept for the metadata table in the database). So ultimately your pushing the text bitstreams into the assetstore (s3) in FilterMedia and pulling it back out on Lucene indexing, a double-whammy. Cheers, Mark It's my understanding that in a filesystem-based assetstore, for example, text is extracted from PDFs and stored in a separate file *within the assetstore directory* that Lucene crawls. I just don't know how that sort of thing is handled when using object-based storage. On Thu, 2007-05-03 at 13:28 -0400, Richard Rodgers wrote: Hi Cory: Not sure about the limits of Lucene, but I think the larger point is that the back-ends are expected only to hold the real content or assets. Everything else (full-text indices and the like) are *artifacts* (can be recreated from the assets) that we don't need to manage in the same way. If for performance reasons we want to put them where the assets are we can, but there is really no connection between the two that the system imposes. Does this get at your question, or did I miss the point? Thanks, Richard R On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote: (Apologies if this has been discussed to resolution; after a few attempts to search the archives, I concluded they are really broken. 500 errors, bad links, etc.) For those using, interested in, or knowledgeable about using API-based storage (SRB, S3) as a backend for DSpace: how does doing so affect full-text indexing? Can anyone describe how, in such a setup, full text is stored and indexed? My uneducated impression is that Lucene would want to work only against a filesystem. Thanks, Cory Snavely University of Michigan Library IT Core Services -- --- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- --- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- --- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech ~ Mark R. Diggory - DSpace Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] srb/s3/etc and lucene
Hi Cory: Not sure about the limits of Lucene, but I think the larger point is that the back-ends are expected only to hold the real content or assets. Everything else (full-text indices and the like) are *artifacts* (can be recreated from the assets) that we don't need to manage in the same way. If for performance reasons we want to put them where the assets are we can, but there is really no connection between the two that the system imposes. Does this get at your question, or did I miss the point? Thanks, Richard R On Thu, 2007-05-03 at 12:13 -0400, Cory Snavely wrote: (Apologies if this has been discussed to resolution; after a few attempts to search the archives, I concluded they are really broken. 500 errors, bad links, etc.) For those using, interested in, or knowledgeable about using API-based storage (SRB, S3) as a backend for DSpace: how does doing so affect full-text indexing? Can anyone describe how, in such a setup, full text is stored and indexed? My uneducated impression is that Lucene would want to work only against a filesystem. Thanks, Cory Snavely University of Michigan Library IT Core Services - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Assetstore physical storage (Amazon's Simple Storage Service: S3)
Hi Richard: A a quick reaction to your questions - I'll look into it more - is this: in principle it would certainly be doable, but the issue will likely be tolerance for performance tradeoffs. In my prototype I preserved the stream-oriented aspect of the API: which means I don't store a local copy of the asset file before shipping it off to S3. Fortunately S3 returns an MD-5 of the contents it receives. Certain types of compression/and or encryption may want to have a view of the whole file to do their work: if so, then the bitstore would have to use a temporary location, receive the whole file, and then resend it, which would obviously double the transfer time. But some compression/crypto schemes don't work that way, so maybe we could be OK. Thanks, Richard On Thu, 2007-04-19 at 21:23 +1200, Richard MAHONEY wrote: Dear Richard, On Thu, 2007-04-19 at 04:23, Richard Rodgers wrote: Richard: I'm putting up a prototype implementation of (inter alia) an S3 backend on the DSpace wiki. (see 'PluggableStorage' page). Would love volunteers to vet it (not ready for production). Thanks, Richard R. Without wanting to sound overly effusive, I'd just like to say how deeply grateful I am that you are working on the Amazon S3 bitstore. This is all very exciting and I hope to experiment with S3BitStore once I am finished migrating Indica et Buddhica to Joyent/TextDrive, hopefully by the end of the month.** ... Something I'd like to ask before then though. Presently all the material I hold on S3 consists of encrypted compressed tar balls (Solaris 10: gtar, bzip2, encrypt). These can be created using UNIX pipes, similar to producing encrypted tape backups. How hard would it be, then, to use S3BitStore to send encrypted, possibly compressed, data to an assetstore on S3? I already send and retrieve all material using SSL. It seems to me that the addition of data encryption and compression would certainly go some way to reassuring an institution wishing to archive sensitive material, cost effectively. Would all of this be non-trivial? Any thoughts. Kind regards, Richard M. ** I think I recall reading a while ago on this list about firms, notably TextDrive, being unwilling to host Java apps. It seemed that if one wished to run DSpace one needed a dedicated machine. This is no longer the case. See Joyent/TextDrive's Accelerators: http://radiant.joyent.com/accelerator/ On Thu, 2007-04-12 at 09:49 +1200, Richard MAHONEY wrote: Dear Robert et al., On Thu, 2007-04-12 at 07:15, Robert Tansley wrote: We considered this way back when (2001); we decided on using the filesystem because some files might be very very large, there might be lots of them and in general it's easier to split filesystem-based asset stores across multiple drives/machines than a big relational database. That said, the intention was that storage would be made pluggable -- so you could have RDBMS, SRB/iRODs, open-source GoogleFileSystem, LOCKSS-ish etc. storage. That pluggability ended up being one of the many non-critical-for-version-1 features we had to drop to get DSpace 1.0 finished :-) There are some projects (e.g. the MIT ones) looking at how to really accomplish this. Over the past few weeks I've been using Amazon's Simple Storage Service (S3): http://www.amazon.com/gp/browse.html?node=16427261 At this point I've merely been using it to backup web servers and development directories. This has involved the simple upload of compressed tarballs (using the Java app. jSh3ll) but also the synchronising of file systems (using the Ruby app. s3sync). In all, I've been pleasantly surprised by the results. It would seem that the S3 storage system promises to be more resilient than anything I could build at a reasonable cost. Although I've only been using S3 for remote backup, it seems that it can also be used as a live file system for storing and retrieving data for web apps. I am wondering then, if anyone, may be able to suggest how it might be possible to configure (cajole) DSpace-1.4 into using S3 as an assetstore. The Amazon blurb says that S3: `Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.' Best regards, Richard MAHONEY -- Richard MAHONEY | internet: http://indica-et-buddhica.org/ Littledene | telephone/telefax (man.): +64 3 312 1699 Bay Road| cellular: +64 27 482 9986 OXFORD, NZ | email: [EMAIL PROTECTED] ~~~ Indica et Buddhica: Materials for Indology and Buddhology Repositorium: http://indica-et-buddhica.org/repositorium/ Philologica: http://indica-et-buddhica.org/philologica/ Subscriptions: http://subscriptions.indica-et-buddhica.org
Re: [Dspace-tech] Assetstore physical storage (Amazon's Simple Storage Service: S3)
Richard: I'm putting up a prototype implementation of (inter alia) an S3 backend on the DSpace wiki. (see 'PluggableStorage' page). Would love volunteers to vet it (not ready for production). Thanks, Richard R. On Thu, 2007-04-12 at 09:49 +1200, Richard MAHONEY wrote: Dear Robert et al., On Thu, 2007-04-12 at 07:15, Robert Tansley wrote: We considered this way back when (2001); we decided on using the filesystem because some files might be very very large, there might be lots of them and in general it's easier to split filesystem-based asset stores across multiple drives/machines than a big relational database. That said, the intention was that storage would be made pluggable -- so you could have RDBMS, SRB/iRODs, open-source GoogleFileSystem, LOCKSS-ish etc. storage. That pluggability ended up being one of the many non-critical-for-version-1 features we had to drop to get DSpace 1.0 finished :-) There are some projects (e.g. the MIT ones) looking at how to really accomplish this. Over the past few weeks I've been using Amazon's Simple Storage Service (S3): http://www.amazon.com/gp/browse.html?node=16427261 At this point I've merely been using it to backup web servers and development directories. This has involved the simple upload of compressed tarballs (using the Java app. jSh3ll) but also the synchronising of file systems (using the Ruby app. s3sync). In all, I've been pleasantly surprised by the results. It would seem that the S3 storage system promises to be more resilient than anything I could build at a reasonable cost. Although I've only been using S3 for remote backup, it seems that it can also be used as a live file system for storing and retrieving data for web apps. I am wondering then, if anyone, may be able to suggest how it might be possible to configure (cajole) DSpace-1.4 into using S3 as an assetstore. The Amazon blurb says that S3: `Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.' Best regards, Richard MAHONEY - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Reparenting subcommunities
Hi Mark: You could do, but there is already a tool to accomplish this: check the doc for CommunityFiliator. Richard On Tue, 2007-04-17 at 10:16 -0400, Mark H. Wood wrote: Our initial community structure has been rethought, and now I need to move some subcommunities to new locations in the structure. Is it enough to just hack the community2community table to give the child a new parent_comm_id? - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech