Hi Peter,

Thanks for trying the code - it is possibly the first time it has been tested 
with a large amount of production data.

> But then tossed a million java errors:
> solroutput.log.2009-06-15
> Processing file: /dspace/log/solroutput.log.2009-06-15
> java.net.SocketTimeoutException
>       at org.xbill.DNS.Client.blockUntil(Client.java:43)
>       at org.xbill.DNS.UDPClient.recv(UDPClient.java:43)
>       at org.xbill.DNS.UDPClient.sendrecv(UDPClient.java:70)
>       at org.xbill.DNS.SimpleResolver.send(SimpleResolver.java:256)
>       at 
> org.xbill.DNS.ExtendedResolver$Resolution.start(ExtendedResolver.java:93)
>       at org.xbill.DNS.ExtendedResolver.send(ExtendedResolver.java:359)
>       at org.dspace.statistics.util.DnsLookup.reverseDns(DnsLookup.java:36)
>       at 
> org.dspace.statistics.util.StatisticsImporter.load(StatisticsImporter.java:191)
>       at 
> org.dspace.statistics.util.StatisticsImporter.main(StatisticsImporter.java:75)
> 
> I'm wondering if it just skips over entries that cause this, or if some 
> SocketTimeout constant needs to be bumped up?

I might add a new command line option to skip the DNS lookups as it will be 
quite slow performing so many thousands of reverse lookups. It also doesn't 
have a cache - I'll add one of those.

> Here are a few lines from solroutput.log.2009.06-15
> 20090616000000288,view_item,6177,2009-06-16T00:00:00,anonymous,66.249.68.172
> 20090616000001115,view_item,38737,2009-06-16T00:00:01,anonymous,66.249.68.68
> 20090616000001939,view_bitstream,77155,2009-06-16T00:00:01,anonymous,66.249.68.68
> 20090616000003457,view_item,5127,2009-06-16T00:00:03,anonymous,66.249.68.172
> 20090616000004278,view_item,16976,2009-06-16T00:00:04,anonymous,66.249.68.172

(These are all from googlebot - we need to make sure they don't get counted - 
see http://jira.dspace.org/jira/browse/DS-440)

> @stuartlewis There is a small typo in that comment (StatistcisImporter)

Which comment?

> If anyone wants, I can email solroutput.log.2009.06-15 it is 2.2M. However my 
> dspace.log.2009-06-15 is 30M.

That would be great! Would you be happy to attach it to the JIRA entry? (We'll 
need to write a little script to convert the handles into handles that exist on 
a test machine it is running with, as it has to look up the DSpace object from 
the handle.

> Lastly, execution halted with this (machine has 4GB ram):
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at java.lang.Class.getDeclaredFields0(Native Method)
>       at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
>       at java.lang.Class.getDeclaredField(Class.java:1880)
>       at 
> java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:181)
>       at 
> java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:65)
>       at java.sql.SQLException.<clinit>(SQLException.java:353)
>       at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1295)
>       at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
>       at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
>       at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
>       at 
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:258)
>       at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
>       at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
>       at 
> org.dspace.storage.rdbms.DatabaseManager.queryTable(DatabaseManager.java:239)
>       at org.dspace.content.Item.retrieveMetadata(Item.java:202)
>       at org.dspace.content.Item.<init>(Item.java:148)
>       at org.dspace.content.Bundle.getItems(Bundle.java:358)
>       at org.dspace.statistics.SolrLogger.storeParents(SolrLogger.java:295)
>       at 
> org.dspace.statistics.util.StatisticsImporter.load(StatisticsImporter.java:239)
>       at 
> org.dspace.statistics.util.StatisticsImporter.main(StatisticsImporter.java:75)

Maybe we need to call some context 'completes' from time to time to clear its 
cache? I'll look at that one too.

Many thanks for the feedback - it is invaluable for getting the code working 
well.

Thanks,


Stuart Lewis
IT Innovations Analyst and Developer
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: 64 9 373-7599 x81928
http://www.library.auckland.ac.nz/


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to