Hi Peter, Thanks for trying the code - it is possibly the first time it has been tested with a large amount of production data.
> But then tossed a million java errors: > solroutput.log.2009-06-15 > Processing file: /dspace/log/solroutput.log.2009-06-15 > java.net.SocketTimeoutException > at org.xbill.DNS.Client.blockUntil(Client.java:43) > at org.xbill.DNS.UDPClient.recv(UDPClient.java:43) > at org.xbill.DNS.UDPClient.sendrecv(UDPClient.java:70) > at org.xbill.DNS.SimpleResolver.send(SimpleResolver.java:256) > at > org.xbill.DNS.ExtendedResolver$Resolution.start(ExtendedResolver.java:93) > at org.xbill.DNS.ExtendedResolver.send(ExtendedResolver.java:359) > at org.dspace.statistics.util.DnsLookup.reverseDns(DnsLookup.java:36) > at > org.dspace.statistics.util.StatisticsImporter.load(StatisticsImporter.java:191) > at > org.dspace.statistics.util.StatisticsImporter.main(StatisticsImporter.java:75) > > I'm wondering if it just skips over entries that cause this, or if some > SocketTimeout constant needs to be bumped up? I might add a new command line option to skip the DNS lookups as it will be quite slow performing so many thousands of reverse lookups. It also doesn't have a cache - I'll add one of those. > Here are a few lines from solroutput.log.2009.06-15 > 20090616000000288,view_item,6177,2009-06-16T00:00:00,anonymous,66.249.68.172 > 20090616000001115,view_item,38737,2009-06-16T00:00:01,anonymous,66.249.68.68 > 20090616000001939,view_bitstream,77155,2009-06-16T00:00:01,anonymous,66.249.68.68 > 20090616000003457,view_item,5127,2009-06-16T00:00:03,anonymous,66.249.68.172 > 20090616000004278,view_item,16976,2009-06-16T00:00:04,anonymous,66.249.68.172 (These are all from googlebot - we need to make sure they don't get counted - see http://jira.dspace.org/jira/browse/DS-440) > @stuartlewis There is a small typo in that comment (StatistcisImporter) Which comment? > If anyone wants, I can email solroutput.log.2009.06-15 it is 2.2M. However my > dspace.log.2009-06-15 is 30M. That would be great! Would you be happy to attach it to the JIRA entry? (We'll need to write a little script to convert the handles into handles that exist on a test machine it is running with, as it has to look up the DSpace object from the handle. > Lastly, execution halted with this (machine has 4GB ram): > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2291) > at java.lang.Class.getDeclaredField(Class.java:1880) > at > java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:181) > at > java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:65) > at java.sql.SQLException.<clinit>(SQLException.java:353) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1295) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:258) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93) > at > org.dspace.storage.rdbms.DatabaseManager.queryTable(DatabaseManager.java:239) > at org.dspace.content.Item.retrieveMetadata(Item.java:202) > at org.dspace.content.Item.<init>(Item.java:148) > at org.dspace.content.Bundle.getItems(Bundle.java:358) > at org.dspace.statistics.SolrLogger.storeParents(SolrLogger.java:295) > at > org.dspace.statistics.util.StatisticsImporter.load(StatisticsImporter.java:239) > at > org.dspace.statistics.util.StatisticsImporter.main(StatisticsImporter.java:75) Maybe we need to call some context 'completes' from time to time to clear its cache? I'll look at that one too. Many thanks for the feedback - it is invaluable for getting the code working well. Thanks, Stuart Lewis IT Innovations Analyst and Developer Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: 64 9 373-7599 x81928 http://www.library.auckland.ac.nz/ ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

