Re: [Dspace-tech] Fwd: SOLR/Discovery Date Parsing

2013-11-08 Thread Bram Luyten
Hi Matthew,

interesting challenge. I'm not sure how it can be addressed without
modifying the Java or the dates in your metadata.

When looking at:
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L1439

It seems like the date is guessed purely on String length. Maybe this date
guessing can be made more robust by doing proper regex matching, like the
example here:

http://stackoverflow.com/a/3390252

Note that this example code also requires some additional matches to add
timezone support. To make sure this doesn't get lost, I added this as a
JIRA ticket: https://jira.duraspace.org/browse/DS-1775

best regards,

Bram

-- 
[image: logo]
*Bram Luyten* *@mire*
*2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
*Esperantolaan 4, Heverlee 3001, Belgium*
www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml


On Thu, Nov 7, 2013 at 7:36 PM, Matthew McKinley 
matthewjamesmckin...@gmail.com wrote:

 Whoops! Sent this to the wrong list.



 *Matthew McKinley Digital Project Specialist, University of California,
 Irvine http://www.uci.edu/**about.me
 http://www.about.me/matthewmckinley*


 -- Forwarded message --
 From: Matthew McKinley matthewjamesmckin...@gmail.com
 Date: Thu, Nov 7, 2013 at 10:20 AM
 Subject: SOLR/Discovery Date Parsing
 To: dspace-de...@lists.sourceforge.net


 Hi all,

 We're running DSpace 1.8.2 on Tomcat 6 on a RedHat server.

 Trying to make the switch to discovery and have most of the kinks worked
 out except indexing dates. Many of our dates are of simple MM-DD-
 variety, but some include a timestamp as well and these are not being
 indexed correctly by update-discovery-index. An example of an error
 encountered is below:


 2013-11-07 09:28:26,156 ERROR org.dspace.discovery.SolrServiceImpl @
 Unable to parse date format
 java.text.ParseException: Unparseable date: 1998-03-05T07:11:44PST
 at java.text.DateFormat.parse(DateFormat.java:337)
 at
 org.dspace.discovery.SolrServiceImpl.toDate(SolrServiceImpl.java:1017)
 at
 org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:737)
 at
 org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:153)
 at
 org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:297)
 at
 org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:262)
 at org.dspace.discovery.IndexClient.main(IndexClient.java:113)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)


 From manually editing the dates and re-updating the discovery index, it
 seems the problem is either the time zone or lack thereof. Looking at the
 java file (org.dspace.discovery.SolrServiceImpl), it looks like
 Discovery/SOLR will accept

 -MM-dd'T'HH:mm:ss.SSS'Z'


 or

 -MM-dd'T'HH:mm:ss'Z'

 But will NOT accept either a timezone such as PST at the end of the date
 string or no time zone at all (i.e. -MM-dd'T'HH:mm:ss)

 Is there a way to get around this issue and have Discovery/SOLR index
 these date values without modifying the java? We have a lot of dspace
 objects in this (pretty standard UTC) date + time + timezone format and I'd
 hate to have to remove information just to make them index nicely.

 Thanks!
 Matthew




 *Matthew McKinley Digital Project Specialist, University of California,
 Irvine http://www.uci.edu/**about.me
 http://www.about.me/matthewmckinley*



 --
 November Webinars for C, C++, Fortran Developers
 Accelerate application performance with scalable programming models.
 Explore
 techniques for threading, error checking, porting, and tuning. Get the most
 from the latest Intel processors and coprocessors. See abstracts and
 register
 http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette:
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk___

Re: [Dspace-tech] Fwd: SOLR/Discovery Date Parsing

2013-11-08 Thread Matthew McKinley
Bram,

Thanks for this. I figured there wasn't an easy fix, but wanted to ask to
make sure.

And its good this has been translated to a JIRA ticket. From what I can
tell, discovery can't handle time zones at all--only UTC/zulu time, and
without being able to handle an offset time. It's understandable because
date parsing is kind of a nightmare, but even making it a little more
robust will go a long way.




*Matthew McKinley Digital Project Specialist, University of California,
Irvine http://www.uci.edu/**about.me
http://www.about.me/matthewmckinley*


On Fri, Nov 8, 2013 at 5:54 AM, Bram Luyten b...@atmire.com wrote:

 Hi Matthew,

 interesting challenge. I'm not sure how it can be addressed without
 modifying the Java or the dates in your metadata.

 When looking at:

 https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L1439

 It seems like the date is guessed purely on String length. Maybe this date
 guessing can be made more robust by doing proper regex matching, like the
 example here:

 http://stackoverflow.com/a/3390252

 Note that this example code also requires some additional matches to add
 timezone support. To make sure this doesn't get lost, I added this as a
 JIRA ticket: https://jira.duraspace.org/browse/DS-1775

 best regards,

 Bram

 --
 [image: logo]
 *Bram Luyten* *@mire*
 *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
 *Esperantolaan 4, Heverlee 3001, Belgium*
 www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml


 On Thu, Nov 7, 2013 at 7:36 PM, Matthew McKinley 
 matthewjamesmckin...@gmail.com wrote:

 Whoops! Sent this to the wrong list.



 *Matthew McKinley Digital Project Specialist, University of California,
 Irvine http://www.uci.edu/**about.me
 http://www.about.me/matthewmckinley*


 -- Forwarded message --
 From: Matthew McKinley matthewjamesmckin...@gmail.com
 Date: Thu, Nov 7, 2013 at 10:20 AM
 Subject: SOLR/Discovery Date Parsing
 To: dspace-de...@lists.sourceforge.net


 Hi all,

 We're running DSpace 1.8.2 on Tomcat 6 on a RedHat server.

 Trying to make the switch to discovery and have most of the kinks worked
 out except indexing dates. Many of our dates are of simple MM-DD-
 variety, but some include a timestamp as well and these are not being
 indexed correctly by update-discovery-index. An example of an error
 encountered is below:


 2013-11-07 09:28:26,156 ERROR org.dspace.discovery.SolrServiceImpl @
 Unable to parse date format
 java.text.ParseException: Unparseable date: 1998-03-05T07:11:44PST
 at java.text.DateFormat.parse(DateFormat.java:337)
 at
 org.dspace.discovery.SolrServiceImpl.toDate(SolrServiceImpl.java:1017)
 at
 org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:737)
 at
 org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:153)
 at
 org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:297)
 at
 org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:262)
 at org.dspace.discovery.IndexClient.main(IndexClient.java:113)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)


 From manually editing the dates and re-updating the discovery index, it
 seems the problem is either the time zone or lack thereof. Looking at the
 java file (org.dspace.discovery.SolrServiceImpl), it looks like
 Discovery/SOLR will accept

 -MM-dd'T'HH:mm:ss.SSS'Z'


 or

 -MM-dd'T'HH:mm:ss'Z'

 But will NOT accept either a timezone such as PST at the end of the
 date string or no time zone at all (i.e. -MM-dd'T'HH:mm:ss)

 Is there a way to get around this issue and have Discovery/SOLR index
 these date values without modifying the java? We have a lot of dspace
 objects in this (pretty standard UTC) date + time + timezone format and I'd
 hate to have to remove information just to make them index nicely.

 Thanks!
 Matthew




 *Matthew McKinley Digital Project Specialist, University of California,
 Irvine http://www.uci.edu/**about.me
 http://www.about.me/matthewmckinley*



 --
 November Webinars for C, C++, Fortran Developers
 Accelerate application performance with scalable programming models.
 Explore
 techniques for threading, error checking, porting, and tuning. Get the
 most
 from the latest Intel processors and coprocessors. See abstracts and
 register

 http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk
 ___
 DSpace-tech mailing list