Thanks Nick, WRT test files, there's a test file associated with the bugzilla entry, did you mean something more targeted than this?
I've added some extra information from the Microsoft open specifications to the item which actually seems to make the use of this header in POI date metadata extraction a bit suspect. It might be better to drop the use of the message submission chunk for date extraction moving forward. For moving HSMF forward, I'll have a think about what you said. My knowledge of the current code base is pretty slack. Only the message submission chunk really! Is the current code built upon HPSF or POIFS? If so, I might start having a play with the Microsoft specs... Adrian -----Original Message----- From: Nick Burch [mailto:[email protected]] Sent: 18 March 2015 12:47 To: POI Developers List Subject: Re: Bugzilla item #57678 (Incorrect year parsing in message submission chunk) On Tue, 17 Mar 2015, Adrian Conlon wrote: > I submitted a bug report + patch a week or so ago, and I was wondering > whether one of the committers could take a look and see whether it > looks OK or not. > > https://bz.apache.org/bugzilla/show_bug.cgi?id=57678 Are you sure the logic for when to switch from 19xx to 20xx is right? If you could produce a test file and/or reference in the spec, that'd help! > I realise that it isn't a major bug, but I'm using this as "testing > the water" for making other bug fixes for HSMF. With that in mind, I'd > appreciate pointers as to making accepting my changes as painless as > possible for committers to take. HSMF started life before Microsoft released the file format specs, and is based around what we could figure out easily from hex dumps. It turns out that we got some key parts back-t-front. As such, pretty much only "variable length" properties are supported. While we do have some support for fixed length properties now (which actually cover most of them), we don't have a link between the properties in the propery table and their variable length chunks with their values in. What it really needs is someone to spend some time with the spec, work out exactly how a variable length property in the properties chunk maps to a value chunk, and code up some logic to do that. With that in place, we can deprecate much of the current code driven by the value chunks, and replace it with a "proper" way of going via the properties list. That will also mean we can expose and use a lot more properties than we currently do, and possibly also avoid some hacky things like parsing string message headers to try to find dates Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] ____________________________________________________________ Electronic mail messages entering and leaving Arup business systems are scanned for acceptability of content and viruses --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
