Hi all,

I have now created reproducible tests to illustrate the problem I'm
having with POI HSMF's handling of dates in Outlook 2007 files.   I've
posted the tests in the bug report I filed:

  https://issues.apache.org/bugzilla/show_bug.cgi?id=53784

Nick Burch kindly added some comments to the bug report suggesting the
path to a solution.  I'd welcome any assistance - and if you'd like to
take this on for pay, please contact me off list with an estimate.

Thanks,
Joe

p.s. If there are other forums besides this for reaching talented POI
developers who would be willing to send an estimate, please point me
there!


On Tue, Aug 21, 2012 at 8:50 PM, Joe Wicentowski <[email protected]> wrote:
> Hi Dave,
>
> I would happily accept quotes for the job; please send quotes to me off list.
>
> Thanks,
> Joe
>
> Sent from my iPad
>
> On Aug 21, 2012, at 8:12 PM, Dave Fisher <[email protected]> wrote:
>
>> Hi Joe,
>>
>> Are you looking to pay this person to help or are you looking for someone 
>> with the same "itch" as you?
>>
>> (Not that I am volunteering either way - it's not my area.)
>>
>> Regards,
>> Dave
>>
>> On Aug 21, 2012, at 2:33 PM, Joe Wicentowski wrote:
>>
>>> Hi all,
>>>
>>> I hadn't heard from anyone about the question I posed last week --
>>> regarding POI/HSMF's problems identifying dates in Outlook .msg files.
>>> Is there a better forum for me to post this?  Should I file a bug?
>>> Ideally, I'd like to find someone who can help complete the fix that
>>> Nick Burch began in POI's SVN trunk.
>>>
>>> Thanks for any pointers about the best way to proceed,
>>> Joe
>>>
>>> On Thu, Aug 16, 2012 at 6:52 PM, Joe Wicentowski <[email protected]> wrote:
>>>> Hi all,
>>>>
>>>> Hello!  This is my message to the list.  I'm building an application
>>>> that relies on Tika to extract text from Outlook 2007 .msg files.
>>>> Tika relies on POI's HSMF libraries, which is why I'm writing to this
>>>> list about a problem: HSMF is not pulling out the date of many of my
>>>> Outlook messages.
>>>>
>>>> For example, when I look at one of my message files (.msg) in Outlook,
>>>> it says that the message was sent on "Fri 6/22/2012 8:11 AM", but when
>>>> I process the same message with Tika, no date appears in the output
>>>> [1].
>>>>
>>>> In comparison, I tried using a different tool, ruby-msg
>>>> (http://code.google.com/p/ruby-msg/), to process the same message, and
>>>> ruby-msg did pull out the date [2].  This experiment shows that the
>>>> email *is* in the .msg file, and that Tika is failing to pick it up.
>>>>
>>>> Nick Burch from the Tika mailing list took a close, hands-on look at
>>>> my .msg file, determined the cause, and outlined a path to the fix:
>>>>
>>>>> I think I've figured out what's wrong. It looks like outlook stores
>>>>> properties with a fixed size of 0-8 bytes in a different chunk in the 
>>>>> file,
>>>>> which we weren't processing.
>>>>>
>>>>> If you wanted to tackle it, that'd be great! You'll want to take a look at
>>>>> PropertiesChunk, and fill in the TODOs for readProperties and
>>>>> writeProperties, then add unit tests. See:
>>>>>
>>>>> http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hsmf/datatypes/PropertiesChunk.java?view=markup
>>>>>
>>>>> When that's all done and working, then
>>>>> the final step is to update MAPIMessage to read some of the values as 
>>>>> needed
>>>>> out of the properties
>>>>>
>>>>> The info I've been working with comes from this blog post:
>>>>> http://blogs.msdn.com/b/openspecification/archive/2009/11/06/msg-file-format-part-1.aspx
>>>>>
>>>>> (That links into suitable bits of the public documentation)
>>>>>
>>>>> I suspect it's under a day's work. I've put in place the basics, just 
>>>>> needs someone to flesh it out.
>>>>
>>>> While Nick kindly tracked down the cause, unfortunately I lack the
>>>> java chops to complete the solution.
>>>>
>>>> Would anyone here be kind enough to assist me with this?
>>>>
>>>> I'm happy to test any attempted fixes, and I'm happy to provide more
>>>> info, like sample Outlook files (.msg files).  My hope is that this
>>>> fix will allow POI to "just work" for more users who are evaluating
>>>> it.
>>>>
>>>> Thank you in advance,
>>>> Joe
>>>>
>>>>
>>>> [1] Tika output showing no date, retrieved via the following command:
>>>>
>>>>  java -jar tika-app-1.1.jar "Inquiry.msg" > inquiry.html
>>>>
>>>> <html xmlns="http://www.w3.org/1999/xhtml";>
>>>>   <head>
>>>>       <meta name="Message-Bcc" content="" />
>>>>       <meta name="subject" content="Inquiry" />
>>>>       <meta name="Content-Length" content="40960" />
>>>>       <meta name="Message-Recipient-Address" content="[email protected]" />
>>>>       <meta name="Message-From" content="History Mailbox" />
>>>>       <meta name="Author" content="History Mailbox" />
>>>>       <meta name="Message-To" content="'Snip'" />
>>>>       <meta name="Message-Cc" content="" />
>>>>       <meta name="Content-Type" content="application/vnd.ms-outlook" />
>>>>       <meta name="resourceName" content="RE  Inquiry.msg" />
>>>>   </head>
>>>>   <body>
>>>>       <h1>RE: Inquiry</h1>
>>>>       <dl>
>>>>           <dt>From</dt>
>>>>           <dd>History Mailbox</dd>
>>>>           <dt>To</dt>
>>>>           <dd>'Snip'</dd>
>>>>           <dt>Recipients</dt>
>>>>           <dd>[email protected]</dd>
>>>>       </dl>
>>>>       <p>Dear Snip</p>
>>>> ...
>>>>
>>>> [2] The ruby-msg output -- notice the "Date:" line:
>>>>
>>>> From: "History Mailbox" <[email protected]>
>>>> To: "Snip" <[email protected]>
>>>> Subject: RE: Inquiry
>>>> Date: Fri, 22 Jun 2012 12:11:00 -0000
>>>> Message-ID: 
>>>> <[email protected]>
>>>> In-Reply-To: 
>>>> <CAJ4nNe1FPo7Q=10dbk8sdzprarzykjv6skv3nyg5l2li13b...@mail.gmail.com>
>>>> Priority: 0
>>>> Thread-Topic: Inquiry
>>>> Content-Type: multipart/alternative;
>>>> boundary="----_=_NextPart_001_8149ed38.4fec8c61"

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to