[ 
https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604635#comment-16604635
 ] 

David Smiley commented on TIKA-2722:
------------------------------------

[~thetaphi] we agree there is a JDK bug, and we agree as to the nature (I was 
looking at the same code you imply while I debugged).  I reported it to Oracle 
using their normal channel for reporting bugs.  I didn't reach out to Rory 
directly; it never crossed my mind that it would be appropriate to contact 
Oracle engineers directly on such matters when they have a form and internal 
triage processes.  Based on my past experience reporting a JDK bug, I expect to 
get an email telling me about a bug ID.  Until that time, I don't have a 
reference number I can share here, though I will do so when I get it!  If you 
want to reproduce this bug for yourself, I can share the code (run with 
assertions enabled):

{code:java}
import java.util.Locale;
import java.util.TimeZone;

public class TestJdkBug {
  public static void main(String[] args) {
    Locale.setDefault(Locale.forLanguageTag("ar")); // (Arabic)
    TimeZone zi = TimeZone.getTimeZone("Etc/GMT-5");
    final String displayName = zi.getDisplayName(false, TimeZone.SHORT, 
Locale.US);
    assert "GMT+05:00".equals(displayName) : displayName;
  }
}
{code}
Passes in v8 & v9 but not v11.

> Don't call Date.toString (Possible issue with JDK 11)
> -----------------------------------------------------
>
>                 Key: TIKA-2722
>                 URL: https://issues.apache.org/jira/browse/TIKA-2722
>             Project: Tika
>          Issue Type: Bug
>         Environment: Tika 1.18, JDK 11 with locale set to "ar-EG".  
>            Reporter: David Smiley
>            Priority: Major
>
> I'm troubleshooting [a test failure in Apache 
> Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] 
> "extracting" contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 
> passes; I don't know about JDK 10. It has to do with extracting date metadata 
> from a PDF, particularly the created date but perhaps others too.
> I stepped through the code into Tika and I think I've found out where the 
> troublesome code is.  First note PDFParser line 271: {{addMetadata(metadata, 
> "created", info.getCreationDate());}}.  That addMetadata overload variant 
> will call toString on a Date.  IMO that's asking for trouble since the output 
> of that is Locale-dependent.  I think that's okay to show to a user but not 
> for machine-to-machine information exchange.  In the case of the test, it 
> yielded this odd looking date string:
> Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008
> I pasted that in and it looks consistent with what I see in IntelliJ and in 
> Jenkins logs; hopefully will post correctly to JIRA.  The odd part is the 
> hour & minutes relative to GMT.  I won't be certain until after I click 
> "Create".
> Perhaps this problem is also indicative of a JDK 11 bug?  Nevertheless I 
> think Tika should avoid calling Date.toString().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to