Re: [Dspace-tech] alternative to solr statistics
Hi Peter, First of all, thanks for the answers We've already worked with google analytics, and it seems pretty well...but you can't control how the statistics are done, so it's not an option to us. On the other hand, we've already study the ElasticSearch option, but is made over Lucene, so the search methods and ram comsuption are similar...We've are studying the Sphinx option, which is much more efficient and faster than solr Finally, I'll give a try to the Addons 1.6.2 option which I thought that wasn't updated to new versions of DSpace. Regards, Jesús On 10/18/2011 12:15 AM, Peter Dietz wrote: Hi Jesús, We've run into SOLR statistics performance problems as well. You've posted that you have a very large solr index, and unfortunately solr performance degrade's as the index grows. We don't allow non-administrator's to view statistics for a collection/community/item on production because it slows the system down too much. However, when we need to provide a report, we copy the SOLR index to another computer, such as your workstation, and view the statistics locally. A local computer with a lot of memory will run solr fine, however a busy server does not also run SOLR that well. If you want to be able to present reports on your production system, I'm thinking the only thing you can throw at the problem is resources. Perhaps adding an additional server just to host SOLR, similar to how you might have an additional server just to host mySQL or postgresql. My co-worker and I were wondering about the idea of switching out the dspace-stats implementation with a different engine, such as removing solr, and using something beefier such as ElasticSearch, however we haven't implemented anything. As has been mentioned by some others. You might be able to figure out how to get Google Analytics to track all of the hits to your items, communities, collections, bitstreams. In such case, you could then query Google Analytics API for this information. Finally, something to anonymize the solr statistics information would be a good thing. We currently have IP address for every visitor to every resource for every single request. Assuming we had a good grip on robots, I think we could aggregate this to just record the number of hits to a given resource per hour. After aggregating, and pruning, you might end up with a much smaller solr database. Instead of tens of millions, perhaps just hundreds of thousands of records. I think one should consult the COUNTER project before altering your statistics though. Peter Dietz 2011/10/17 Richard Rodgers rrodg...@mit.edu mailto:rrodg...@mit.edu Hi Jesús: A lot of statistics work has been done for DSpace over time, but each project focuses on different sets of requirements: does the data need to appear in the UI, does it offer real-time availability (just to name two of the strengths of the SOLR-based system)? One example of an alternative is https://wiki.duraspace.org/display/DSPACE/StatisticsAddOn, though I don't know if this has been maintained against versions newer than DSpace 1.6.2 We run an entirely off-line, monthly reporting system using a database designed to accommodate a set of internal administrative requirements - where statistics are delivered as a spreadsheet - , but that might not fulfill your requirements. The tech list archives and the wiki are a good place to start, but you could also post to the list what your use case(s) are, and see if any existing work better meets your needs. Hope this helps, Richard R On Oct 17, 2011, at 6:00 AM, Jesús Martín García wrote: Hi! I've been wondering if there is some kind of alternative to solr statistics, due to the high load of ram to our system (514 millions of records) which it's not easy to scale and it's very very slow. So...Has someone done some work on an alternative? Thanks in advance, Regards, Jesús -- ... __ / / Jesús Martín García C E / S / C A Tècnic de Projectes /__ / Centre de Serveis Científics i Acadèmics de Catalunya Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat mailto:jmar...@cesca.cat ... -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___
Re: [Dspace-tech] Regenerate monthly reports
Brian, Thanks for the vote of confidence. I'll give this a try tonight when our servers aren't as busy. Another question, there's nothing that gets modified in the database when this happens, so I shouldn't need to restart Tomcat, right? Thanks, Alan On Mon, Oct 17, 2011 at 5:34 PM, Brian Freels-Stendel bfre...@unm.edu wrote: Hello, I've done this a few times, and it's never been a problem for me. I make a backup of all of the .dat files in the log directory and the entire reports directory before deleting the .dat and .html files, just in case. B-- On 10/17/2011 at 1:14 AM, in message CAKKdN4Wu-eKf6ff29ruVOCC1EUsQgxevVA6GCPgjPi=bqut...@mail.gmail.com, Alan Orth alan.o...@gmail.com wrote: Hi, Never heard back on this, so I'm re-sending: We had some bad metadata and didn't realize for a few weeks that our stats scripts were choking. Now we have a gap in our monthly stats (08/2011, 10/2011... but no 09/2011!) Is clearing the stats and rebuilding from scratch feasible? All the historical log files are there... Thanks! Alan On Tue, Oct 11, 2011 at 5:21 PM, Alan Orth alan.o...@gmail.com wrote: Hey, We noticed recently that our monthly stats hadn't run for the month of September. As it turns out, a batch import had imported some items with malformed `dc.date.accessioned` date fields, which was causing the stats scripts to die. We finally tracked down all the items with these bad dates[1], and now the scripts are running successfully, but it seems the month of September has gone missing (we have 08/2011 and 10/2011)! My attempts to fix this are here: http://pastebin.com/9EDX8Vhx I'm curious, would starting over from `dspace stat-initial` and `dspace stat-report-initial` remedy this? All the log files are there, and as far as I know the stat scripts process .log - .dat - .html (nothing in the database or anything. Is there any danger in doing this (other than being expensive for the CPU/disk)? Thanks, [1] DSpace-tech thread: http://www.mail-archive.com/dspace-tech@lists.sourceforge.net/msg15295.html -- Alan Orth alan.o...@gmail.com http://alaninkenya.org I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone. -Bjarne Stroustrup, inventor of C++ -- Alan Orth alan.o...@gmail.com http://alaninkenya.org http://mjanja.co.ke In heaven all the interesting people are missing. -Friedrich Nietzsche -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] alternative to solr statistics
Hello, I watched the discussion about Solr statistics and want to make a small contribution. We have the task to make DSpace statistics reports in the COUNTER standart. We decided to use AWStat (http://awstats.sourceforge.net) and to develop Perl add-on to AWstats for COUNTER compliance. Has anyone already gone this route? Is there a rakes? Regards Fedor. Message: 4 Date: Mon, 17 Oct 2011 18:15:56 -0400 From: Peter Dietz pdiet...@gmail.com Subject: Re: [Dspace-tech] alternative to solr statistics To: Richard Rodgers rrodg...@mit.edu Cc: dspace-tech@lists.sourceforge.net dspace-tech@lists.sourceforge.net Message-ID: caootv+o8on5obu+unajfjpxraerlcbfagtaq81hw+js4nga...@mail.gmail.com Content-Type: text/plain; charset=utf-8 Hi Jes?s, We've run into SOLR statistics performance problems as well. You've posted that you have a very large solr index, and unfortunately solr performance degrade's as the index grows. We don't allow non-administrator's to view statistics for a collection/community/item on production because it slows the system down too much. However, when we need to provide a report, we copy the SOLR index to another computer, such as your workstation, and view the statistics locally. A local computer with a lot of memory will run solr fine, however a busy server does not also run SOLR that well. If you want to be able to present reports on your production system, I'm thinking the only thing you can throw at the problem is resources. Perhaps adding an additional server just to host SOLR, similar to how you might have an additional server just to host mySQL or postgresql. My co-worker and I were wondering about the idea of switching out the dspace-stats implementation with a different engine, such as removing solr, and using something beefier such as ElasticSearch, however we haven't implemented anything. As has been mentioned by some others. You might be able to figure out how to get Google Analytics to track all of the hits to your items, communities, collections, bitstreams. In such case, you could then query Google Analytics API for this information. Finally, something to anonymize the solr statistics information would be a good thing. We currently have IP address for every visitor to every resource for every single request. Assuming we had a good grip on robots, I think we could aggregate this to just record the number of hits to a given resource per hour. After aggregating, and pruning, you might end up with a much smaller solr database. Instead of tens of millions, perhaps just hundreds of thousands of records. I think one should consult the COUNTER project before altering your statistics though. Peter Dietz -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Thoughts about statistics (was: alternative to solr statistics)
This points out a problem that I think we (and many other contemporary projects) have all over the place: our application is expected to grow steadily and without limit, yet we assume over and over again that the problem is small and bounded. There is no way around it: if your repository is large and busy, sooner or later you will be disappointed by the performance of ad-hoc queries no matter how many resources you throw at them. One answer to this is to depend less on ad-hoc queries. Do you have some usual questions to be answered over and over? Do you really need up-to-the-second answers? Would it be good enough to run periodic reports and accumulate them? Some other machine with SPSS or R or whatever can grind cases all night, if need be, and leave your monthly abstract waiting in your inbox the next day. (I want to find the time to extend DSpace to facilitate this.) If the periodic abstractions are saved in raw form before rendering, they become cheap inputs to longer-range reports. There are *far* more efficient methods than those presently provided for extracting information from vast quantities of data. Once periodic statistical products are available, they can be simply fetched over and over again and slotted into DSpace pages to provide tolerably up-to-date views of activity quickly and cheaply. We just don't do that yet. Once periodic statistical products are available, we don't have to keep twenty years of event data in Solr; we can purge old cases to dead storage and combine precalculated summaries with live statistics over only the latest events to keep the numbers fresh without having responsiveness suffer more and more over time. We just don't do that yet. Once we have a well-designed way to get cases out of DSpace for use with other tools, we can produce as many streams as we wish, selected any way that makes sense. We can cheaply provide custom-tailored data products to individual contributors and other consumers for their own analysis. We just don't do that yet. There's still an important place for ad-hoc query, but how often would something less expensive do just as well? ALL cases are historical; they're not going to change. We only need to recalculate when we change our view of the cases. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Asking whether markets are efficient is like asking whether people are smart. pgpz5JzmCYH3E.pgp Description: PGP signature -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Regenerate monthly reports
Hi Alan, That's right, Tomcat won't need to be restarted. The stats scripts do access the database to get titles of items, but it doesn't alter anything, so there should be no side effects. I'd say good luck, but you won't need it, big smile. B-- On 10/18/2011 at 1:11 AM, in message CAKKdN4VFfghmNyeB6pK8TC2JxcOGLTQnp=pjc0ugacmz7le...@mail.gmail.com, Alan Orth alan.o...@gmail.com wrote: Brian, Thanks for the vote of confidence. I'll give this a try tonight when our servers aren't as busy. Another question, there's nothing that gets modified in the database when this happens, so I shouldn't need to restart Tomcat, right? Thanks, Alan On Mon, Oct 17, 2011 at 5:34 PM, Brian Freels-Stendel bfre...@unm.edu wrote: Hello, I've done this a few times, and it's never been a problem for me. I make a backup of all of the .dat files in the log directory and the entire reports directory before deleting the .dat and .html files, just in case. B-- On 10/17/2011 at 1:14 AM, in message CAKKdN4Wu-eKf6ff29ruVOCC1EUsQgxevVA6GCPgjPi=bqut...@mail.gmail.com, Alan Orth alan.o...@gmail.com wrote: Hi, Never heard back on this, so I'm re-sending: We had some bad metadata and didn't realize for a few weeks that our stats scripts were choking. Now we have a gap in our monthly stats (08/2011, 10/2011... but no 09/2011!) Is clearing the stats and rebuilding from scratch feasible? All the historical log files are there... Thanks! Alan On Tue, Oct 11, 2011 at 5:21 PM, Alan Orth alan.o...@gmail.com wrote: Hey, We noticed recently that our monthly stats hadn't run for the month of September. As it turns out, a batch import had imported some items with malformed `dc.date.accessioned` date fields, which was causing the stats scripts to die. We finally tracked down all the items with these bad dates[1], and now the scripts are running successfully, but it seems the month of September has gone missing (we have 08/2011 and 10/2011)! My attempts to fix this are here: http://pastebin.com/9EDX8Vhx I'm curious, would starting over from `dspace stat-initial` and `dspace stat-report-initial` remedy this? All the log files are there, and as far as I know the stat scripts process .log - .dat - .html (nothing in the database or anything. Is there any danger in doing this (other than being expensive for the CPU/disk)? Thanks, [1] DSpace-tech thread: http://www.mail-archive.com/dspace-tech@lists.sourceforge.net/msg15295.html -- Alan Orth alan.o...@gmail.com http://alaninkenya.org I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone. -Bjarne Stroustrup, inventor of C++ -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Thoughts about statistics
+1 to what Mark Wood says. An additional (parallel) thought -- when I was at U of Illinois, we ran into similar scalability issues with one of the older statistics add-ons we were using (the one initially built by U of Rochester that stored stats in the DSpace Database). The way we got around it was the following: We made a distinct decision to aggregate our data actively purge older event data. This resulted in an *immediate* increase in scalability. To better explain... Essentially this older U of Rochester stats engine worked similar to the new Solr Statistics engine, except that it used the DB instead of Solr. So, it tracked each statistical event, including IP address, what the event was, etc. Over time the stats queries became rather expensive as the tables grew and grew. The tables were also full of IP address info that we really didn't need to keep around forever, and also information about old web spiders that we really didn't care about. (As you can tell, this is all very parallel to the current Solr Statistics issues.) So, as I said, we aggregated things. We decided to only keep IP addresses/full statistical events for a period of *one month*. After that, all non-spider hits were aggregated/totaled into a monthly totals table (we threw out anything that was a web spider -- as that data was not useful and just made tables larger queries more complex). Although I don't think we went this far at U of Illinois, you could do a secondary aggregation and then aggregate/total stats again at a *yearly* level. The idea here is that you make conscious decisions around what information is important and aggregate it. Stuff that is not important to keep forever (e.g. exact IP addresses for all hits, information from known-spiders) can just be discarded during the aggregation process. The aggregation simplifies larger queries (especially ones for yearly/monthly info, as you no longer need to perform complex calculations -- it's just a simple lookup) If we brought this same sort of idea forward into Solr, I think you'd be less likely to encounter such performance issues. We'd only keep around full event details for a limited period of time (a month / 6 months), after which we'd discard information which was not necessary to generate the reports aggregate everything else. Just an idea -- I've never tried this before with the Solr Statistics engine. But, a Solr savvy person could likely figure out a way to implement this for the benefit of all of us. - Tim On 10/18/2011 7:52 AM, Mark H. Wood wrote: This points out a problem that I think we (and many other contemporary projects) have all over the place: our application is expected to grow steadily and without limit, yet we assume over and over again that the problem is small and bounded. There is no way around it: if your repository is large and busy, sooner or later you will be disappointed by the performance of ad-hoc queries no matter how many resources you throw at them. One answer to this is to depend less on ad-hoc queries. Do you have some usual questions to be answered over and over? Do you really need up-to-the-second answers? Would it be good enough to run periodic reports and accumulate them? Some other machine with SPSS or R or whatever can grind cases all night, if need be, and leave your monthly abstract waiting in your inbox the next day. (I want to find the time to extend DSpace to facilitate this.) If the periodic abstractions are saved in raw form before rendering, they become cheap inputs to longer-range reports. There are *far* more efficient methods than those presently provided for extracting information from vast quantities of data. Once periodic statistical products are available, they can be simply fetched over and over again and slotted into DSpace pages to provide tolerably up-to-date views of activity quickly and cheaply. We just don't do that yet. Once periodic statistical products are available, we don't have to keep twenty years of event data in Solr; we can purge old cases to dead storage and combine precalculated summaries with live statistics over only the latest events to keep the numbers fresh without having responsiveness suffer more and more over time. We just don't do that yet. Once we have a well-designed way to get cases out of DSpace for use with other tools, we can produce as many streams as we wish, selected any way that makes sense. We can cheaply provide custom-tailored data products to individual contributors and other consumers for their own analysis. We just don't do that yet. There's still an important place for ad-hoc query, but how often would something less expensive do just as well? ALL cases are historical; they're not going to change. We only need to recalculate when we change our view of the cases. -- All the data
Re: [Dspace-tech] Java fatal error on dspace import
Hi Andrea, Jose and Mark. Thank you! I tried Jose suggestion, importing one by one, but the error seemed to be randomic. I switched to sun java 1.6, but at the same time I reset the dabatase (it was on our test installation), and the problem was gone. The problem could have been Java7 but could also have been the database, I should have tested them separatedly. But at least these messages could be a good tip for one who finds the same problem as me in the future. Thanks again André Assada Em 14 de outubro de 2011 17:15, Andrea Bollini boll...@cilea.it escreveu: Hi André, I noted that you use java 7 I have not direct experience with this but there are a lot of post in the web reporting issues using java 7 with lucene/solr. See for example: http://www.infoq.com/news/2011/08/java7-hotspot Hope this help, Andrea Il 14/10/2011 19:44, André ha scritto: Dear all, I'm trying to import 157 registries on dspace 1.6.2 by calling [dspace]/bin/dspace import --add --eperson=andre.ass...@usp.br--collection= 123456789/32 --source=/home/andre/xImpAleph/impTeste111014/xvi_fd --mapfile=./xvi_fd --workflow It starts the process ok, but int the middle I get the following error message: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fea376dc440, pid=20001, tid=140644013197072 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode linux-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(Lorg/apache/lucene/document/Fieldable;Lorg/apache/lucene/analysis/Analyzer;I)V # # Core dump written. Default location: /dspace/bin/core or core.20001 (max size 1 kB). To ensure a full core dump, try ulimit -c unlimited before starting Java again # # An error report file with more information is saved as: # /dspace/bin/hs_err_pid20001.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # ./dspace: line 69: 20001 Aborted java $JAVA_OPTS -classpath $FULLPATH $LOG org.dspace.app.launcher.ScriptLauncher $@ If I retry to import, with the --resume option, it restarts very slowly, and in dspace.log I get the following message: 2011-10-14 14:01:26,342 ERROR org.dspace.search.DSIndexer @ Lock obtain timed out: SimpleFSLock@/dspace/search/write.lock org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/dspace/search/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:691) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:452) at org.dspace.search.DSIndexer.openIndex(DSIndexer.java:781) at org.dspace.search.DSIndexer.writeDocument(DSIndexer.java:853) at org.dspace.search.DSIndexer.buildDocument(DSIndexer.java:1138) at org.dspace.search.DSIndexer.indexContent(DSIndexer.java:299) at org.dspace.search.DSIndexer.updateIndex(DSIndexer.java:584) at org.dspace.search.DSIndexer.main(DSIndexer.java:545) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:212) Searching the archive of this list, I found some people solved this by deleting the write.lock and afterwards force the reindexation by running ./dsrun org.dspace.search.DSIndexer -c This solves the slowdown problem but doesn't solve the import problem. I tried to stop tomcat before importing, to guarantee none was accessing the index at the same time, but this didn't solve the problem. I also set more free memory with JAVA_OPTS=-Xmx512m and also -Xmx1024m, but this also didn't do the trick. Has anyone had this problem? Could share any ideas? Thanks in advance Andre Assada -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense.http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech -- Dott. Andrea Bollini boll...@cilea.it ph. +39 06 59292853 - mob. +39 348 8277525 - fax +39 06 5913770 CILEA - Consorzio Interuniversitariohttp://www.cilea.it/disclaimer
[Dspace-tech] Approval of submitted items by site/collection admin
Hi everyone, I have a question regarding item submission and policies in DSpace 1.7.1 - I set up COLLECTION_x_WORKFLOW_STEP_2 and COLLECTION_x_ADMIN groups for all of the collections within my instance of DSpace and only included myself in the group. I then set up COLLECTION_x_SUBMIT groups including all of our users. My goal is to be notified when users submit items so I can review the submission, make additions if necessary, and approve/reject the submissions. However, I found that when I submit items myself, I am made to go through the approval process for my own submissions, even though I am the admin for the site as well as the collection. Does anyone know how to get around this so that submissions from the collection admin group (or the site admin) are automatically archived and do not need approval? The extra step is proving to be quite tedious! Here is an example of my current collection policies: 114209 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=114209 ADMIN http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=114209 COLLECTION_8_ADMIN (includes only my account) 90255 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=90255 ADD http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=90255 COLLECTION_8_WORKFLOW_STEP_2 (includes only my account) 90254 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=90254 ADD http://infosvs:8080/xmlui/admin/epeople?administrative-continue=6a527e1 016333b48852e2b1163134d494855311fsubmit_editpolicy_id=90254 COLLECTION_8_SUBMIT (includes group of all users) 78 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=78 DEFAULT_BITSTREAM_READ http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=78 BBC (group includes all users) 77 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=77 DEFAULT_ITEM_READ http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=77 BBC 76 http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=76 READ http://infosvs:8080/xmlui/admin/epeople?administrative-continue=3345397 9666c4a45836e33125c343e257a100333submit_editpolicy_id=76 BBC Thanks, Alicia Verno Information Services Manager, Boston Biomedical Consultants -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Regenerate monthly reports
Hi Alan, Another question, there's nothing that gets modified in the database when this happens, so I shouldn't need to restart Tomcat, right? Yes - that is correct. Everything happens on disk (not in the DB): .log files -- .dat files -- .html reports The .html reports are then loaded when required. Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] solr statistics
Hi Tint, Our repository is running on 1.6.2 and we have been using solr for a few months now. There seems to be some problem with solr statistics. Bitstream for some items were downloaded more than a few thousand times within a month from the same place. How can I filter out such systematic access (by bots/spiders etc)? Take a look at the following tool: /dspace/bin/dspace stats-util -h usage: StatisticsClient -b,--reindex-bitstreams Reindex the bitstreams to ensure we have the bundle name -r,--remove-deleted-bitstreams While indexing the bundle names remove the statistics about deleted bitstreams -u,--update-spider-files Update Spider IP Files from internet into /dspace/config/spiders -f,--delete-spiders-by-flag Delete Spiders in Solr By isBot Flag -i,--delete-spiders-by-ipDelete Spiders in Solr By IP Address -m,--mark-spidersUpdate isBot Flag in Solr -h,--helphelp -o,--optimizeRun maintenance on the SOLR index You might need to first register the IP address of the bots in /dspace/config/spiders/ I hope that helps, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Export Authority Key in Metadata
I'm using the authority control tool to associate canonical university identifiers with authors in Dspace 1.7.1 and would like to export metadata containing the authority key. I was hoping that the metadata export would contain an authority key field delimited in the same way that the author field is exported. Is there any way to do this other than querying the database? If not, might any of you have done this and have advice? Thanks, jt -- Jim Tuttle Digital Repository Program Coordinator Duke University Libraries 919.613.6831 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Unable to send mail
Hi, On 19/10/11 16:25, Justin A. Diana wrote: Unfortunately, that causes me even more confusion as it successfully sent the email and I successfully received it externally. It honestly looks like the app is never even attempting to send the email (nothing in the messages, maillog or dspace.log when I get that error in the UI). Very strange. Have you tried other situations in which normally e-mails are sent by DSpace (ie not registration)? Could you subscribe to a collection, then add a new item to that collection and run the sub-daily script -- do you get an e-mail then? [dspace]/bin/dspace sub-daily -- see https://wiki.duraspace.org/display/DSDOC/Installation#Installation-%27cron%27Jobs cheers, Andrea -- Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] solr statistics: Internal Server Error
Hello, We are running DSpace 1.6.2 JSPUI with tomcat-6.0.32 and we have been using SOLR for the last 5 months. However I just released no statistics were generated for this current month. I've tried tools such as reindex-update, stats-util, and stats-log-importer Any help would be greatly appreciated. My SOLR conf solr.log.server = https://(mydspace.edu)/solr/statistics solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat statistics.item.authorization.admin=false solr.spiderips.urls = http://iplists.com/google.txt, \ http://iplists.com/inktomi.txt, \ http://iplists.com/lycos.txt, \ http://iplists.com/infoseek.txt, \ http://iplists.com/altavista.txt, \ http://iplists.com/excite.txt, \ http://iplists.com/misc.txt, \ http://iplists.com/non_engines.txt tomcat server.xml Context path=/solr docBase=/usr/local/dspace/app/webapps/solr debug=0 reloadable=true cachingAllowed=false allowLinking=true/ Errors in my dspace.log 2011-10-19 14:20:15,390 ERROR org.dspace.statistics.SolrLogger @ Internal Server Error Internal Server Error request: https://mydspace.edu/solr/statistics/update?wt=javabinversion=2.2 org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: https://mydspace.edu/solr/statistics/update?wt=javabinversion=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) at org.dspace.statistics.SolrLogger.post(SolrLogger.java:245) at org.dspace.statistics.SolrLoggerUsageEventListener.receiveEvent(SolrLoggerUsageEventListener.java:41) at org.dspace.services.events.SystemEventService.fireLocalEvent(SystemEventService.java:154) at org.dspace.services.events.SystemEventService.fireEvent(SystemEventService.java:97) at org.dspace.app.webui.servlet.HandleServlet.doDSGet(HandleServlet.java:259) at org.dspace.app.webui.servlet.DSpaceServlet.processRequest(DSpaceServlet.java:151) at org.dspace.app.webui.servlet.DSpaceServlet.doGet(DSpaceServlet.java:99) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:112) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:776) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:705) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:898) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:662) Regards, OA -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech