Hi Dave, I would happily accept quotes for the job; please send quotes to me off list.
Thanks, Joe Sent from my iPad On Aug 21, 2012, at 8:12 PM, Dave Fisher <[email protected]> wrote: > Hi Joe, > > Are you looking to pay this person to help or are you looking for someone > with the same "itch" as you? > > (Not that I am volunteering either way - it's not my area.) > > Regards, > Dave > > On Aug 21, 2012, at 2:33 PM, Joe Wicentowski wrote: > >> Hi all, >> >> I hadn't heard from anyone about the question I posed last week -- >> regarding POI/HSMF's problems identifying dates in Outlook .msg files. >> Is there a better forum for me to post this? Should I file a bug? >> Ideally, I'd like to find someone who can help complete the fix that >> Nick Burch began in POI's SVN trunk. >> >> Thanks for any pointers about the best way to proceed, >> Joe >> >> On Thu, Aug 16, 2012 at 6:52 PM, Joe Wicentowski <[email protected]> wrote: >>> Hi all, >>> >>> Hello! This is my message to the list. I'm building an application >>> that relies on Tika to extract text from Outlook 2007 .msg files. >>> Tika relies on POI's HSMF libraries, which is why I'm writing to this >>> list about a problem: HSMF is not pulling out the date of many of my >>> Outlook messages. >>> >>> For example, when I look at one of my message files (.msg) in Outlook, >>> it says that the message was sent on "Fri 6/22/2012 8:11 AM", but when >>> I process the same message with Tika, no date appears in the output >>> [1]. >>> >>> In comparison, I tried using a different tool, ruby-msg >>> (http://code.google.com/p/ruby-msg/), to process the same message, and >>> ruby-msg did pull out the date [2]. This experiment shows that the >>> email *is* in the .msg file, and that Tika is failing to pick it up. >>> >>> Nick Burch from the Tika mailing list took a close, hands-on look at >>> my .msg file, determined the cause, and outlined a path to the fix: >>> >>>> I think I've figured out what's wrong. It looks like outlook stores >>>> properties with a fixed size of 0-8 bytes in a different chunk in the file, >>>> which we weren't processing. >>>> >>>> If you wanted to tackle it, that'd be great! You'll want to take a look at >>>> PropertiesChunk, and fill in the TODOs for readProperties and >>>> writeProperties, then add unit tests. See: >>>> >>>> http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hsmf/datatypes/PropertiesChunk.java?view=markup >>>> >>>> When that's all done and working, then >>>> the final step is to update MAPIMessage to read some of the values as >>>> needed >>>> out of the properties >>>> >>>> The info I've been working with comes from this blog post: >>>> http://blogs.msdn.com/b/openspecification/archive/2009/11/06/msg-file-format-part-1.aspx >>>> >>>> (That links into suitable bits of the public documentation) >>>> >>>> I suspect it's under a day's work. I've put in place the basics, just >>>> needs someone to flesh it out. >>> >>> While Nick kindly tracked down the cause, unfortunately I lack the >>> java chops to complete the solution. >>> >>> Would anyone here be kind enough to assist me with this? >>> >>> I'm happy to test any attempted fixes, and I'm happy to provide more >>> info, like sample Outlook files (.msg files). My hope is that this >>> fix will allow POI to "just work" for more users who are evaluating >>> it. >>> >>> Thank you in advance, >>> Joe >>> >>> >>> [1] Tika output showing no date, retrieved via the following command: >>> >>> java -jar tika-app-1.1.jar "Inquiry.msg" > inquiry.html >>> >>> <html xmlns="http://www.w3.org/1999/xhtml"> >>> <head> >>> <meta name="Message-Bcc" content="" /> >>> <meta name="subject" content="Inquiry" /> >>> <meta name="Content-Length" content="40960" /> >>> <meta name="Message-Recipient-Address" content="[email protected]" /> >>> <meta name="Message-From" content="History Mailbox" /> >>> <meta name="Author" content="History Mailbox" /> >>> <meta name="Message-To" content="'Snip'" /> >>> <meta name="Message-Cc" content="" /> >>> <meta name="Content-Type" content="application/vnd.ms-outlook" /> >>> <meta name="resourceName" content="RE Inquiry.msg" /> >>> </head> >>> <body> >>> <h1>RE: Inquiry</h1> >>> <dl> >>> <dt>From</dt> >>> <dd>History Mailbox</dd> >>> <dt>To</dt> >>> <dd>'Snip'</dd> >>> <dt>Recipients</dt> >>> <dd>[email protected]</dd> >>> </dl> >>> <p>Dear Snip</p> >>> ... >>> >>> [2] The ruby-msg output -- notice the "Date:" line: >>> >>> From: "History Mailbox" <[email protected]> >>> To: "Snip" <[email protected]> >>> Subject: RE: Inquiry >>> Date: Fri, 22 Jun 2012 12:11:00 -0000 >>> Message-ID: >>> <[email protected]> >>> In-Reply-To: >>> <CAJ4nNe1FPo7Q=10dbk8sdzprarzykjv6skv3nyg5l2li13b...@mail.gmail.com> >>> Priority: 0 >>> Thread-Topic: Inquiry >>> Content-Type: multipart/alternative; >>> boundary="----_=_NextPart_001_8149ed38.4fec8c61" >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
