[ 
https://issues.apache.org/jira/browse/SOLR-12561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548094#comment-16548094
 ] 

David Smiley commented on SOLR-12561:
-------------------------------------

Patch:
* Use java.time thoroughly; no java.text remnants nor use of Date.  Always 
Instant.
* Enhanced TestExtractionDateUtil a lot to be more thorough to test more of the 
supported patterns and their idiosyncrasies.  I want to ensure we don't break 
back-compat here!  These better tests helped uncovered some issues during 
development of this switch.
* Two of the default patterns had a lowercase "hh" for hour of AM/PM instead of 
"HH" for hour of day.  SimpleDateFormat seemed to deal with this but I think 
they are fundamentally invalid without an AM/PM qualifier.  I switched them to 
HH.  If someone custom configures the patterns in their solr config, they'll 
need to use the correct designator.
* Use parsed DateTimeFormatter instances instead of Strings in 
SolrContentHandler and it's factory.  Since order might be significant or might 
be used for performance reasons, I also switched to LinkedHashSet from HashSet 
for the impl in ExtractingRequestHandler's config parser.

This seems safe for 7.x; any break would seem to be very obscure IMO.  On the 
other hand, 8.0 will be out this fall or so.

> Port ExtractionDateUtil to java.time API
> ----------------------------------------
>
>                 Key: SOLR-12561
>                 URL: https://issues.apache.org/jira/browse/SOLR-12561
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: master (8.0)
>
>         Attachments: SOLR-12561.patch
>
>
> The ExtractionDateUtil class in the extraction contrib uses 
> SimpleDateFormatter.  The Java 8 java.time API is superior; you can find 
> articles out there why.  One thing that comes to mind is less timezone 
> bugginess – SOLR-10243.  Although the API may be a bit baroque IMO 
> (over-engineered).  Here I'd like to switch over the API and furthermore have 
> the patterns be pre-parsed so that at runtime we don't need to re-parse the 
> patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to