Re: [Dspace-tech] item counter failed
Hi, I just tried but it doesn’t work Thanks sisay -Original Message- From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84 Sent: Wednesday, October 12, 2011 4:11 PM To: Webshet, Sisay (ILRI) Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] item counter failed AFAICT you did everything correctly. The numbers may stop updating because Cocoon can serve you the whole page from cache. You should try to clean the Cocoon cache: 1.) First, don't forget to shutdown Tomcat. 2.) cd ${tomcat6.home}/work/Catalina/{appropriate.domain.dir}/_/ 3.) rm -rf cache-dir 4.) Start Tomcat again. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] item counter failed
Are you running the following command to refresh the strength cache? - [dspace]/bin/dspace itemcounter Perhaps run it every 10 minutes as a cron job. Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 On 13/10/2011, at 7:04 PM, Webshet, Sisay (ILRI) wrote: Hi, I just tried but it doesn’t work Thanks sisay -Original Message- From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84 Sent: Wednesday, October 12, 2011 4:11 PM To: Webshet, Sisay (ILRI) Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] item counter failed AFAICT you did everything correctly. The numbers may stop updating because Cocoon can serve you the whole page from cache. You should try to clean the Cocoon cache: 1.) First, don't forget to shutdown Tomcat. 2.) cd ${tomcat6.home}/work/Catalina/{appropriate.domain.dir}/_/ 3.) rm -rf cache-dir 4.) Start Tomcat again. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Memory isues on Java
Hi Sue Thank you for wonderful reply, I guarantee to you that I will try my best to implement this and come with positive feedback. Regards, Lewatle From: Thornton, Susan M. (LARC-B702)[LITES] [mailto:susan.m.thorn...@nasa.gov] Sent: 12 October 2011 04:35 PM To: Lewatle Phaladi; dspace-tech@lists.sourceforge.net Subject: RE: Memory isues on Java I periodically get this error and sometimes stopping and restarting your web server will at least temporarily solve this problem. What memory size are you using? We use the following and rarely see this error anymore: JAVA_OPTS=-Xms2048m -Xmx3072m -Xss512k -Dfile.encoding=UTF-8 Hope this helps. Best regards, Sue Sue Walker-Thornton (757) 864-2368 From: Lewatle Phaladi [mailto:lewatle.phal...@wits.ac.za] Sent: Wednesday, October 12, 2011 4:59 AM To: dspace-tech@lists.sourceforge.net Subject: [Dspace-tech] Memory isues on Java Dear Dspace Team I have tried increasing memory size in java configuration file but I still get this error when I want to create image thumbnails, any idea on what I can do to erase this error for good? -bash-3.00$ ./dspace filter-media Exception: Java heap space java.lang.OutOfMemoryError: Java heap space at java.awt.image.DataBufferByte.init(DataBufferByte.java:58) at java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleModel.java:397) at java.awt.image.Raster.createWritableRaster(Raster.java:938) at javax.imageio.ImageTypeSpecifier.createBufferedImage(ImageTypeSpecifier.java:1056) at javax.imageio.ImageReader.getDestination(ImageReader.java:2879) at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:980) at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:948) at javax.imageio.ImageIO.read(ImageIO.java:1422) at javax.imageio.ImageIO.read(ImageIO.java:1326) at org.dspace.app.mediafilter.JPEGFilter.getDestinationStream(JPEGFilter.java:67) at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:737) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:561) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:511) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:479) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:414) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) -bash-3.00$ Regards, Lewatle This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. htmlpfont face = verdana size = 0.8 color = navyThis communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary./font/p/html-- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application
Re: [Dspace-tech] Memory isues on Java
Hi Alan Thanks for your positive inputs. Regards, Lewatle From: Alan Orth [mailto:alan.o...@gmail.com] Sent: 12 October 2011 04:49 PM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Memory isues on Java Sue, What are the specs of the machine you are running DSpace on? It's always good to compare notes... :) I find that every few months, as our capacity grows, I have to allocate a few hundred more megabytes to Tomcat. Right now Tomcat's defaults look like this for us: JAVA_OPTS=-Djava.awt.headless=true -Xmx768m -Xms768m -XX:MaxPermSize=256m Our machine is only 32 bit, and only has 2 gigs of RAM, so I'm starting to plan for a migration soon. On a semi-related note, I'll also plan to move to a 64-bit OS and possibly a multi-core system. Thanks, Alan On 10/12/2011 05:35 PM, Thornton, Susan M. (LARC-B702)[LITES] wrote: I periodically get this error and sometimes stopping and restarting your web server will at least temporarily solve this problem. What memory size are you using? We use the following and rarely see this error anymore: JAVA_OPTS=-Xms2048m -Xmx3072m -Xss512k -Dfile.encoding=UTF-8 Hope this helps. Best regards, Sue Sue Walker-Thornton (757) 864-2368 From: Lewatle Phaladi [mailto:lewatle.phal...@wits.ac.za] Sent: Wednesday, October 12, 2011 4:59 AM To: dspace-tech@lists.sourceforge.net Subject: [Dspace-tech] Memory isues on Java Dear Dspace Team I have tried increasing memory size in java configuration file but I still get this error when I want to create image thumbnails, any idea on what I can do to erase this error for good? -bash-3.00$ ./dspace filter-media Exception: Java heap space java.lang.OutOfMemoryError: Java heap space at java.awt.image.DataBufferByte.init(DataBufferByte.java:58) at java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleMode l.java:397) at java.awt.image.Raster.createWritableRaster(Raster.java:938) at javax.imageio.ImageTypeSpecifier.createBufferedImage(ImageTypeSpecifier. java:1056) at javax.imageio.ImageReader.getDestination(ImageReader.java:2879) at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReade r.java:980) at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:9 48) at javax.imageio.ImageIO.read(ImageIO.java:1422) at javax.imageio.ImageIO.read(ImageIO.java:1326) at org.dspace.app.mediafilter.JPEGFilter.getDestinationStream(JPEGFilter.ja va:67) at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilt erManager.java:737) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilte rManager.java:561) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterMana ger.java:511) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilt erManager.java:479) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(Media FilterManager.java:414) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.ja va:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) -bash-3.00$ Regards, Lewatle This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Alan Orth alan.o...@gmail.com
[Dspace-tech] solr statistics
Hello Our repository is running on 1.6.2 and we have been using solr for a few months now. There seems to be some problem with solr statistics. Bitstream for some items were downloaded more than a few thousand times within a month from the same place. How can I filter out such systematic access (by bots/spiders etc)? Thanks. Best regards Tint Hla Hla Htoo Librarian NTU Libraries CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its content. Towards A Sustainable Earth: Print Only When Necessary. Thank you. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching
On Wed, Oct 12, 2011 at 20:25, George S Kozak g...@cornell.edu wrote: I have tried running index-init and deleting the extracted text and re-running filter-media, but still no luck with the searches for this collection. Hi, just to make sure - did you run filter-media before or after index-init/index-update? Because filter-media creates text files from media and index-* indexes them. So in case you didn't run index-init or index-update after filter-media, they won't be indexed. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching
Hi, helix84: Yes, I did run things in the correct order. That's what has stumped me. I can't figure out why these specific records are not searchable while other records are searchable. George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -Original Message- From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84 Sent: Thursday, October 13, 2011 6:07 AM To: George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Strange problem with searching On Wed, Oct 12, 2011 at 20:25, George S Kozak g...@cornell.edu wrote: I have tried running index-init and deleting the extracted text and re-running filter-media, but still no luck with the searches for this collection. Hi, just to make sure - did you run filter-media before or after index-init/index-update? Because filter-media creates text files from media and index-* indexes them. So in case you didn't run index-init or index-update after filter-media, they won't be indexed. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] item counter failed
This is a caching bug in the xmlui, we've dealt with it recently by disabling caching in cocoon. This and other caching refreshvissues emerged when Larry Stone's enhancements to return last modified headers based only on the DSpaceObjects last modified field and not the combined state of all the presentation elements being added to the view, we see it in recently added, we see it in item counter, we see it in discovery. To repair... Open your WEB-INF/sitemap.xmap file and replace your caching pipe with the class found it our noncaching pipe. The clear all caching by deleting the contents of your tomcat work directory and restart your tomcat server. Mark On Thursday, October 13, 2011, Stuart Lewis s.le...@auckland.ac.nz wrote: Are you running the following command to refresh the strength cache? - [dspace]/bin/dspace itemcounter Perhaps run it every 10 minutes as a cron job. Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 On 13/10/2011, at 7:04 PM, Webshet, Sisay (ILRI) wrote: Hi, I just tried but it doesn’t work Thanks sisay -Original Message- From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84 Sent: Wednesday, October 12, 2011 4:11 PM To: Webshet, Sisay (ILRI) Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] item counter failed AFAICT you did everything correctly. The numbers may stop updating because Cocoon can serve you the whole page from cache. You should try to clean the Cocoon cache: 1.) First, don't forget to shutdown Tomcat. 2.) cd ${tomcat6.home}/work/Catalina/{appropriate.domain.dir}/_/ 3.) rm -rf cache-dir 4.) Start Tomcat again. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- [image: @mire Inc.] *Mark Diggory* *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] item counter failed
Just a brief note on clearing the XMLUI cache. In upcoming DSpace 1.8.0, it will be possible to clear the cache from the XMLUI Administrator UI. All you do is: Login as an Administrator, Visit your Control Panel, Click on the Java Information tab. On that Tab you'll see a link to Clear Cache Immediately under the Cocoon Information section. Essentially, this new 1.8 feature allows you to clear out the XMLUI/Cocoon Cache without having to restart Tomcat. It may not be a *permanent* fix for these types of caching issue, but at least you no longer have to restart Tomcat when you just want to refresh your cache. I'd still encourage us all to try and get rid of any caching issues that are actually bugs. So please feel free to open up tickets in our Issue Tracker so that we can get specific issues fixed, rather than relying on temporarily clearing the cache. https://jira.duraspace.org/browse/DS - Tim On 10/13/2011 9:47 AM, Mark Diggory wrote: This is a caching bug in the xmlui, we've dealt with it recently by disabling caching in cocoon. This and other caching refreshvissues emerged when Larry Stone's enhancements to return last modified headers based only on the DSpaceObjects last modified field and not the combined state of all the presentation elements being added to the view, we see it in recently added, we see it in item counter, we see it in discovery. To repair... Open your WEB-INF/sitemap.xmap file and replace your caching pipe with the class found it our noncaching pipe. The clear all caching by deleting the contents of your tomcat work directory and restart your tomcat server. Mark On Thursday, October 13, 2011, Stuart Lewis s.le...@auckland.ac.nz mailto:s.le...@auckland.ac.nz wrote: Are you running the following command to refresh the strength cache? - [dspace]/bin/dspace itemcounter Perhaps run it every 10 minutes as a cron job. Thanks, Stuart Lewis Digital Development Manager Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 On 13/10/2011, at 7:04 PM, Webshet, Sisay (ILRI) wrote: Hi, I just tried but it doesn’t work Thanks sisay -Original Message- From: ivan.ma...@gmail.com mailto:ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com mailto:ivan.ma...@gmail.com] On Behalf Of helix84 Sent: Wednesday, October 12, 2011 4:11 PM To: Webshet, Sisay (ILRI) Cc: dspace-tech@lists.sourceforge.net mailto:dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] item counter failed AFAICT you did everything correctly. The numbers may stop updating because Cocoon can serve you the whole page from cache. You should try to clean the Cocoon cache: 1.) First, don't forget to shutdown Tomcat. 2.) cd ${tomcat6.home}/work/Catalina/{appropriate.domain.dir}/_/ 3.) rm -rf cache-dir 4.) Start Tomcat again. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net mailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net mailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- @mire Inc. *Mark Diggory* /2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010/ /Esperantolaan 4, Heverlee 3001, Belgium/ http://www.atmire.com http://www.atmire.com/ -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net
Re: [Dspace-tech] Strange problem with searching
On Thu, Oct 13, 2011 at 15:36, George S Kozak g...@cornell.edu wrote: Yes, I did run things in the correct order. That's what has stumped me. I can't figure out why these specific records are not searchable while other records are searchable. I'm not sure how to help you further. Can you check if the text file in the TEXT bundle has READ access for the Anonymous group? The TEXT bundle itself also has READ access for the Anonymous group by default. Regards, ~~helix84 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Strange problem with searching - More and disturbing information!
Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching - More and disturbing information!
Hi George, Hmm..that's a bit odd. It's definitely not a known issue. In fact, looking at the DSIndexer class (which is the class which creates/updates the Lucene search index), it should be doing what you expect. The 'buildDocumentForItem()' method is the one that takes care of indexing all Item content into a Lucene Document. https://fisheye3.atlassian.com/browse/~br=trunk/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?hb=true#to1040 Specifically, it should be doing the following: 1. Initialize the Lucene Document for the Item 2. Index all Item Metadata 3. Add in all various sort options (so you can sort search results) 4. Locate the TEXT Bundle in the Item and index *all* Bitstreams in that Bundle. If you turn on Debugging you should actually see the DSIndexer report *every* Bitstream that it adds to the index. So, I'm a bit at a loss as to what may be happening. It sounds like your TEXT bundle is getting all the right Bitstreams added (by filter-media). I'm assuming there is only *one* TEXT Bundle, right? (if there are multiple that may be the issue -- but DSpace itself should only be generating one TEXT bundle). The only other thing I can think of is that your 'search.maxfieldlength' setting is too small. In your dspace.cfg you should see: # Maximum number of terms indexed for a single field in Lucene. # Default is 10,000 words - often not enough for full-text indexing. # If you change this, you'll need to re-index for the change # to take effect on previously added items. # -1 = unlimited (Integer.MAX_VALUE) search.maxfieldlength = 1 So, it could be possible that these PDFs are larger, and Lucene just stops indexing content after 10,000 words. You can set this to -1 if you want to disable any word-based limit. Not sure if that helps or not! :) - Tim On 10/13/2011 11:28 AM, George S Kozak wrote: Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching - More and disturbing information!
What version of DSpace are you running? I just tested something completely unrelated this morning, but it involved adding a second document to an Item, then running filter media, then doing a search to do if the text in the second document was found - it WAS. We are running DSpace 1.7.1. JSPUI. Sue Sue Walker-Thornton (757) 864-2368 -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Thursday, October 13, 2011 12:50 PM To: George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Strange problem with searching - More and disturbing information! Hi George, Hmm..that's a bit odd. It's definitely not a known issue. In fact, looking at the DSIndexer class (which is the class which creates/updates the Lucene search index), it should be doing what you expect. The 'buildDocumentForItem()' method is the one that takes care of indexing all Item content into a Lucene Document. https://fisheye3.atlassian.com/browse/~br=trunk/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?hb=true#to1040 Specifically, it should be doing the following: 1. Initialize the Lucene Document for the Item 2. Index all Item Metadata 3. Add in all various sort options (so you can sort search results) 4. Locate the TEXT Bundle in the Item and index *all* Bitstreams in that Bundle. If you turn on Debugging you should actually see the DSIndexer report *every* Bitstream that it adds to the index. So, I'm a bit at a loss as to what may be happening. It sounds like your TEXT bundle is getting all the right Bitstreams added (by filter-media). I'm assuming there is only *one* TEXT Bundle, right? (if there are multiple that may be the issue -- but DSpace itself should only be generating one TEXT bundle). The only other thing I can think of is that your 'search.maxfieldlength' setting is too small. In your dspace.cfg you should see: # Maximum number of terms indexed for a single field in Lucene. # Default is 10,000 words - often not enough for full-text indexing. # If you change this, you'll need to re-index for the change # to take effect on previously added items. # -1 = unlimited (Integer.MAX_VALUE) search.maxfieldlength = 1 So, it could be possible that these PDFs are larger, and Lucene just stops indexing content after 10,000 words. You can set this to -1 if you want to disable any word-based limit. Not sure if that helps or not! :) - Tim On 10/13/2011 11:28 AM, George S Kozak wrote: Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching - More and disturbing information!
Susan: I am running DSpace 1.7.1...I think Tim may be right about my config settings (thanks for the suggestions, Tim). I am going to test that out and I will let everyone know. George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -Original Message- From: Thornton, Susan M. (LARC-B702)[LITES] [mailto:susan.m.thorn...@nasa.gov] Sent: Thursday, October 13, 2011 1:04 PM To: Tim Donohue; George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: RE: [Dspace-tech] Strange problem with searching - More and disturbing information! What version of DSpace are you running? I just tested something completely unrelated this morning, but it involved adding a second document to an Item, then running filter media, then doing a search to do if the text in the second document was found - it WAS. We are running DSpace 1.7.1. JSPUI. Sue Sue Walker-Thornton (757) 864-2368 -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Thursday, October 13, 2011 12:50 PM To: George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Strange problem with searching - More and disturbing information! Hi George, Hmm..that's a bit odd. It's definitely not a known issue. In fact, looking at the DSIndexer class (which is the class which creates/updates the Lucene search index), it should be doing what you expect. The 'buildDocumentForItem()' method is the one that takes care of indexing all Item content into a Lucene Document. https://fisheye3.atlassian.com/browse/~br=trunk/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?hb=true#to1040 Specifically, it should be doing the following: 1. Initialize the Lucene Document for the Item 2. Index all Item Metadata 3. Add in all various sort options (so you can sort search results) 4. Locate the TEXT Bundle in the Item and index *all* Bitstreams in that Bundle. If you turn on Debugging you should actually see the DSIndexer report *every* Bitstream that it adds to the index. So, I'm a bit at a loss as to what may be happening. It sounds like your TEXT bundle is getting all the right Bitstreams added (by filter-media). I'm assuming there is only *one* TEXT Bundle, right? (if there are multiple that may be the issue -- but DSpace itself should only be generating one TEXT bundle). The only other thing I can think of is that your 'search.maxfieldlength' setting is too small. In your dspace.cfg you should see: # Maximum number of terms indexed for a single field in Lucene. # Default is 10,000 words - often not enough for full-text indexing. # If you change this, you'll need to re-index for the change # to take effect on previously added items. # -1 = unlimited (Integer.MAX_VALUE) search.maxfieldlength = 1 So, it could be possible that these PDFs are larger, and Lucene just stops indexing content after 10,000 words. You can set this to -1 if you want to disable any word-based limit. Not sure if that helps or not! :) - Tim On 10/13/2011 11:28 AM, George S Kozak wrote: Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___
Re: [Dspace-tech] Strange problem with searching - More and disturbing information!
We also have search.maxfieldlength set to -1. Sue Sue Walker-Thornton (757) 864-2368 -Original Message- From: George S Kozak [mailto:g...@cornell.edu] Sent: Thursday, October 13, 2011 1:09 PM To: Thornton, Susan M. (LARC-B702)[LITES]; Tim Donohue Cc: dspace-tech@lists.sourceforge.net Subject: RE: [Dspace-tech] Strange problem with searching - More and disturbing information! Susan: I am running DSpace 1.7.1...I think Tim may be right about my config settings (thanks for the suggestions, Tim). I am going to test that out and I will let everyone know. George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -Original Message- From: Thornton, Susan M. (LARC-B702)[LITES] [mailto:susan.m.thorn...@nasa.gov] Sent: Thursday, October 13, 2011 1:04 PM To: Tim Donohue; George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: RE: [Dspace-tech] Strange problem with searching - More and disturbing information! What version of DSpace are you running? I just tested something completely unrelated this morning, but it involved adding a second document to an Item, then running filter media, then doing a search to do if the text in the second document was found - it WAS. We are running DSpace 1.7.1. JSPUI. Sue Sue Walker-Thornton (757) 864-2368 -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Thursday, October 13, 2011 12:50 PM To: George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Strange problem with searching - More and disturbing information! Hi George, Hmm..that's a bit odd. It's definitely not a known issue. In fact, looking at the DSIndexer class (which is the class which creates/updates the Lucene search index), it should be doing what you expect. The 'buildDocumentForItem()' method is the one that takes care of indexing all Item content into a Lucene Document. https://fisheye3.atlassian.com/browse/~br=trunk/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?hb=true#to1040 Specifically, it should be doing the following: 1. Initialize the Lucene Document for the Item 2. Index all Item Metadata 3. Add in all various sort options (so you can sort search results) 4. Locate the TEXT Bundle in the Item and index *all* Bitstreams in that Bundle. If you turn on Debugging you should actually see the DSIndexer report *every* Bitstream that it adds to the index. So, I'm a bit at a loss as to what may be happening. It sounds like your TEXT bundle is getting all the right Bitstreams added (by filter-media). I'm assuming there is only *one* TEXT Bundle, right? (if there are multiple that may be the issue -- but DSpace itself should only be generating one TEXT bundle). The only other thing I can think of is that your 'search.maxfieldlength' setting is too small. In your dspace.cfg you should see: # Maximum number of terms indexed for a single field in Lucene. # Default is 10,000 words - often not enough for full-text indexing. # If you change this, you'll need to re-index for the change # to take effect on previously added items. # -1 = unlimited (Integer.MAX_VALUE) search.maxfieldlength = 1 So, it could be possible that these PDFs are larger, and Lucene just stops indexing content after 10,000 words. You can set this to -1 if you want to disable any word-based limit. Not sure if that helps or not! :) - Tim On 10/13/2011 11:28 AM, George S Kozak wrote: Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Strange problem with searching - More and disturbing information!
Tim: You were right! I changed the config file and now my searches are working for other bitstreams! Thank you, very much!! This clears up a problem that I have had for a long time. Now I wonder what other old and now bad setting that I have! George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Thursday, October 13, 2011 12:50 PM To: George S Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Strange problem with searching - More and disturbing information! Hi George, Hmm..that's a bit odd. It's definitely not a known issue. In fact, looking at the DSIndexer class (which is the class which creates/updates the Lucene search index), it should be doing what you expect. The 'buildDocumentForItem()' method is the one that takes care of indexing all Item content into a Lucene Document. https://fisheye3.atlassian.com/browse/~br=trunk/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?hb=true#to1040 Specifically, it should be doing the following: 1. Initialize the Lucene Document for the Item 2. Index all Item Metadata 3. Add in all various sort options (so you can sort search results) 4. Locate the TEXT Bundle in the Item and index *all* Bitstreams in that Bundle. If you turn on Debugging you should actually see the DSIndexer report *every* Bitstream that it adds to the index. So, I'm a bit at a loss as to what may be happening. It sounds like your TEXT bundle is getting all the right Bitstreams added (by filter-media). I'm assuming there is only *one* TEXT Bundle, right? (if there are multiple that may be the issue -- but DSpace itself should only be generating one TEXT bundle). The only other thing I can think of is that your 'search.maxfieldlength' setting is too small. In your dspace.cfg you should see: # Maximum number of terms indexed for a single field in Lucene. # Default is 10,000 words - often not enough for full-text indexing. # If you change this, you'll need to re-index for the change # to take effect on previously added items. # -1 = unlimited (Integer.MAX_VALUE) search.maxfieldlength = 1 So, it could be possible that these PDFs are larger, and Lucene just stops indexing content after 10,000 words. You can set this to -1 if you want to disable any word-based limit. Not sure if that helps or not! :) - Tim On 10/13/2011 11:28 AM, George S Kozak wrote: Hi. Everyone: After a bit of digging what I have discovered is that any item that has multiple bitstreams of PDFs, only the first bitstream added is searchable. The other bitstreams in the item seem to be ignored by the indexer. I have checked and the extracted Texts are there, so it is not an issue with the filter-media program. We (at Cornell) have many items with multiple bitstreams of PDFs, and so far all of my testing indicates only the first bitstream of the item is being indexed by the Dspace search engine. Is this a known issue? Is there something wrong in my configuration files that may be causing this? George Kozak Digital Library Specialist Cornell University Library Information Technologies (CUL-IT) 501 Olin Library Cornell University Ithaca, NY 14853 607-255-8924 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Style news-xmlui.xml
Hi folks. Is it possible to add formatting or style to 'news-xmlui.xml'? The labels 'b' or i are not working. I only found this: http://www.dspace.org/1_6_0Documentation/ch07.html#N15659 Thank you for your help. -- Antonio Calderón - Calderón Cardona Ltda. http://calderoncardona.com | http://ventura-systems.net *Proudly running Debian GNU/Linux. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Style news-xmlui.xml
Antonio - Others may be more familiar than myself, but I believe I'd read that the news-xmlui.xml file is being phased out of existence. I don't have access to my work machine at the moment so can't give specific details, but I can say that you can edit the xsl files to ignore the content from the news-xmlui.xml file and instead write something into the xsl files that will include the content that you desire while providing the ability to mark up the content in any (well-formed / valid) way that you see fit. If you need specific details, I'm happy to help when I'm back in front of my machine tomorrow. - Patrick E. On Thu, Oct 13, 2011 at 7:33 PM, Antonio Calderón neocalde...@gmail.comwrote: Hi folks. Is it possible to add formatting or style to 'news-xmlui.xml'? The labels 'b' or i are not working. I only found this: http://www.dspace.org/1_6_0Documentation/ch07.html#N15659 Thank you for your help. -- Antonio Calderón - Calderón Cardona Ltda. http://calderoncardona.com | http://ventura-systems.net *Proudly running Debian GNU/Linux. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Patrick K. Etienne Systems Analyst Georgia Institute of Technology Library Information Center (404) 385-8121 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] remove or hide the browse by links
Hi Iam working on dspace 1.7.. xmlui . I want to remove or hide the browse by links of the top level community while Iam inside the next level or lower level community/subcocomminy We have been using different themes for different communities .I have been using messages.xml to add some of our links. Is navigation.xsl the right file to fix or any other file * Browse By Issue Date http://dspacetest.cgiar.org/browse?type=dateissued * Browse By Authors http://dspacetest.cgiar.org/browse?type=author * Browse By Titles http://dspacetest.cgiar.org/browse?type=title is there anyone who can assist on this Thanks sisay -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech