Re: [Dspace-tech] Fwd: SOLR/Discovery Date Parsing
Hi Matthew, interesting challenge. I'm not sure how it can be addressed without modifying the Java or the dates in your metadata. When looking at: https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L1439 It seems like the date is guessed purely on String length. Maybe this date guessing can be made more robust by doing proper regex matching, like the example here: http://stackoverflow.com/a/3390252 Note that this example code also requires some additional matches to add timezone support. To make sure this doesn't get lost, I added this as a JIRA ticket: https://jira.duraspace.org/browse/DS-1775 best regards, Bram -- [image: logo] *Bram Luyten* *@mire* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml On Thu, Nov 7, 2013 at 7:36 PM, Matthew McKinley matthewjamesmckin...@gmail.com wrote: Whoops! Sent this to the wrong list. *Matthew McKinley Digital Project Specialist, University of California, Irvine http://www.uci.edu/**about.me http://www.about.me/matthewmckinley* -- Forwarded message -- From: Matthew McKinley matthewjamesmckin...@gmail.com Date: Thu, Nov 7, 2013 at 10:20 AM Subject: SOLR/Discovery Date Parsing To: dspace-de...@lists.sourceforge.net Hi all, We're running DSpace 1.8.2 on Tomcat 6 on a RedHat server. Trying to make the switch to discovery and have most of the kinks worked out except indexing dates. Many of our dates are of simple MM-DD- variety, but some include a timestamp as well and these are not being indexed correctly by update-discovery-index. An example of an error encountered is below: 2013-11-07 09:28:26,156 ERROR org.dspace.discovery.SolrServiceImpl @ Unable to parse date format java.text.ParseException: Unparseable date: 1998-03-05T07:11:44PST at java.text.DateFormat.parse(DateFormat.java:337) at org.dspace.discovery.SolrServiceImpl.toDate(SolrServiceImpl.java:1017) at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:737) at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:153) at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:297) at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:262) at org.dspace.discovery.IndexClient.main(IndexClient.java:113) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) From manually editing the dates and re-updating the discovery index, it seems the problem is either the time zone or lack thereof. Looking at the java file (org.dspace.discovery.SolrServiceImpl), it looks like Discovery/SOLR will accept -MM-dd'T'HH:mm:ss.SSS'Z' or -MM-dd'T'HH:mm:ss'Z' But will NOT accept either a timezone such as PST at the end of the date string or no time zone at all (i.e. -MM-dd'T'HH:mm:ss) Is there a way to get around this issue and have Discovery/SOLR index these date values without modifying the java? We have a lot of dspace objects in this (pretty standard UTC) date + time + timezone format and I'd hate to have to remove information just to make them index nicely. Thanks! Matthew *Matthew McKinley Digital Project Specialist, University of California, Irvine http://www.uci.edu/**about.me http://www.about.me/matthewmckinley* -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk___
Re: [Dspace-tech] Fwd: SOLR/Discovery Date Parsing
Bram, Thanks for this. I figured there wasn't an easy fix, but wanted to ask to make sure. And its good this has been translated to a JIRA ticket. From what I can tell, discovery can't handle time zones at all--only UTC/zulu time, and without being able to handle an offset time. It's understandable because date parsing is kind of a nightmare, but even making it a little more robust will go a long way. *Matthew McKinley Digital Project Specialist, University of California, Irvine http://www.uci.edu/**about.me http://www.about.me/matthewmckinley* On Fri, Nov 8, 2013 at 5:54 AM, Bram Luyten b...@atmire.com wrote: Hi Matthew, interesting challenge. I'm not sure how it can be addressed without modifying the Java or the dates in your metadata. When looking at: https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L1439 It seems like the date is guessed purely on String length. Maybe this date guessing can be made more robust by doing proper regex matching, like the example here: http://stackoverflow.com/a/3390252 Note that this example code also requires some additional matches to add timezone support. To make sure this doesn't get lost, I added this as a JIRA ticket: https://jira.duraspace.org/browse/DS-1775 best regards, Bram -- [image: logo] *Bram Luyten* *@mire* *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml On Thu, Nov 7, 2013 at 7:36 PM, Matthew McKinley matthewjamesmckin...@gmail.com wrote: Whoops! Sent this to the wrong list. *Matthew McKinley Digital Project Specialist, University of California, Irvine http://www.uci.edu/**about.me http://www.about.me/matthewmckinley* -- Forwarded message -- From: Matthew McKinley matthewjamesmckin...@gmail.com Date: Thu, Nov 7, 2013 at 10:20 AM Subject: SOLR/Discovery Date Parsing To: dspace-de...@lists.sourceforge.net Hi all, We're running DSpace 1.8.2 on Tomcat 6 on a RedHat server. Trying to make the switch to discovery and have most of the kinks worked out except indexing dates. Many of our dates are of simple MM-DD- variety, but some include a timestamp as well and these are not being indexed correctly by update-discovery-index. An example of an error encountered is below: 2013-11-07 09:28:26,156 ERROR org.dspace.discovery.SolrServiceImpl @ Unable to parse date format java.text.ParseException: Unparseable date: 1998-03-05T07:11:44PST at java.text.DateFormat.parse(DateFormat.java:337) at org.dspace.discovery.SolrServiceImpl.toDate(SolrServiceImpl.java:1017) at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:737) at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:153) at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:297) at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:262) at org.dspace.discovery.IndexClient.main(IndexClient.java:113) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183) From manually editing the dates and re-updating the discovery index, it seems the problem is either the time zone or lack thereof. Looking at the java file (org.dspace.discovery.SolrServiceImpl), it looks like Discovery/SOLR will accept -MM-dd'T'HH:mm:ss.SSS'Z' or -MM-dd'T'HH:mm:ss'Z' But will NOT accept either a timezone such as PST at the end of the date string or no time zone at all (i.e. -MM-dd'T'HH:mm:ss) Is there a way to get around this issue and have Discovery/SOLR index these date values without modifying the java? We have a lot of dspace objects in this (pretty standard UTC) date + time + timezone format and I'd hate to have to remove information just to make them index nicely. Thanks! Matthew *Matthew McKinley Digital Project Specialist, University of California, Irvine http://www.uci.edu/**about.me http://www.about.me/matthewmckinley* -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ DSpace-tech mailing list