David Pilato created TIKA-3493:
----------------------------------
Summary: dcterms:created date depends on the current TimeZone in
RTF documents
Key: TIKA-3493
URL: https://issues.apache.org/jira/browse/TIKA-3493
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 2.0.0
Reporter: David Pilato
{color:#333333}I'm migrating an existing project to Tika 2.0.0.
I'm seeing a strange behavior.
TL;DR: the created date of the document changes depending on the timezone.
Long story:
I have a unit test which extracts content and metadata from a [RTF
document|[https://github.com/dadoonet/fscrawler/raw/master/test-documents/src/main/resources/documents/test.rtf]].
When using Tika 1.27, whatever the timezone defined for my JVM, I'm always
getting the same value for "dcterms:created": "2016-07-07T13:38:00Z".
When running the same test with Tika 2.0.0, the date changes depending on the
Timezone.
For example:
{color}
* {color:#333333}Asia/Sakhalin gives dcterms:created=2016-07-06T23:38:00Z
{color}
* {color:#333333}Asia/Colombo gives dcterms:created=2016-07-07T05:08:00Z
{color}
* {color:#333333}Europe/Stockholm gives dcterms:created=2016-07-07T08:38:00Z
{color}
{color:#333333}I don't know if it's a bug or expected. May be the RTF format
does not specify the Timezone.
I'm surprised that I don't see the same behavior for Office documents actually.
{color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)