Hi all, I've just reviewed some mime4j code about parsing and I find the current behaviour not so intuitive.
1) We have a "Field" interface, a RawField and a ParsedField. Most code deal with generic Fields but knows when it is a parsedfield or a rawfield. Nowhere we check the Field implementation to understand if it is already parsed or not, so it seems we always know when it is a parsedfield and when it is a rawfield. Some code calling getName does a trim and a lowercase, some other code simply lowercase without trimming. Why don't we simply canonicalize things in getName and publish a clear contract about what getName returns? 2) We have a FieldParser interface (that I just generified to FieldParser<T extends ParsedField>) that create ParsedField instances starting from String name, String body, ByteSequence raw. This seems "weird" to me. name and body are already a parsing step beyond raw. "FieldParser" is implemented by fields implementation and always call the constructor for the field with the same 3 parameters. All of the fields, in facts, simply parse "body" and bring around "raw" only as convenience way to avoid recreating (reencoding, refolding) stuff when writing to a mime stream. We have 3 callers for FieldParser.parse method: A. Fields.parse(name, body): this method generate a raw sequence before parsing (ContentUtil.encode(MimeUtil.fold(fieldname+": "+fieldbody))). B. DefaultFieldParser.parse(ByteSequence raw, String rawStr): this method, instead, generate name and body starting from raw (So, the same functionality as RawField, but implemented differently). C. DelegatingFieldParser.parse : this is simply a delegator, called in the above 2 places (A&B). The DefaultFieldParser.parse method is always called via DefaultFieldParser.parse(ByteSequence) method, and this method is called by MessageBuilder and SimpleContentHandler field methods (they are our 2 implementations of ContentHandler). So, the ContentHandler.field(Field) method is called by the MimeStreamParser.parse(InputStream) method. Another weird thing: the MimeStreamParser already do some parsing, because it checks for the ":" and build a RawField from the ByteArrayBuffer and the colon position. So: - AbstractEntity.parseField parse a ByteArrayBuffer to find the ":" and build a RawField. - MimeStreamParser calls the ContentHandler passing the RawField as a generic "Field" - Both ContentHandlers we distribute runs a DefaultFieldParser.parse(field.getRaw) looking again for the ":" from "raw" data and returning a ParsedField under the Field interface. Isn't it weird that we work with Field interface, RawField implementation but we simply "drop" any information but the ByteSequence and parse the field again from scratch (getRaw) in order to return "again" a Field (that this time is ParsedField but we don't tell this to anyone)? Also AbstractField tries to be optimized as it use final name/body/raw fields. So AbstractFields is an immutable bean, but in practice, every extension of AbstractField uses lazy parsing (exept the ContentTransferEncodingField) so that it is functionally immutable but not really immutable for the jvm. IMO we should make a choice and embrace it. In order to alter the headers currently an user is expected to parse it and then to create a new raw representation of the new header and parse it to generate the new header to replace the old one. Either we make the fields mutable so to make it easier to alter them, or we make them immutable for real, so the JVM optimize their use, don't you agree? (I mean that we could introduce an interface for each field, and have 2 implementations for each of them: one parsed and one raw, maybe referencing the underlying RawField. The raw one could even be a simply proxy that encapsulate the RawField and create an instance of the parsed field only when a method is requested). 3) I fail to understand what is the contract between mime4j library and its users wrt to fields and fieldsparsing. Let say I have to parse a ContentType: am I expected to call DefaultFieldParser.parse ? Fields.contentType and other methods? As I fail to see the current "idea" maybe there is no idea and simply this is the result of too many hands and refactorings done in the years, so before being the next hand and applying the next refactoring I'd like to collect some thought. Do you think all I've written are foolish thoughts or do you think we should try to sort this stuff out before releasing mime4j 1.0 ?
