On Wed, Feb 4, 2009 at 1:26 PM, Oleg Kalnichevski <[email protected]> wrote: > Markus Wiederkehr wrote: >> >> On Sun, Feb 1, 2009 at 10:34 PM, Robert Burrell Donkin >> <[email protected]> wrote: >>> >>> On Fri, Jan 30, 2009 at 7:37 PM, Markus Wiederkehr >>> <[email protected]> wrote: >>>> >>>> When Mime4j is used to build a DOM some header fields are parsed more >>>> than once.. >>> >>> ...i suspected that this might be the case... >>> >>>> This is what happens: >>>> * AbstractEntity.parseField parses the raw string into a name-value >>>> pair >>>> * AbstractEntity.parseField calls MutableBodyDescriptor.addField for >>>> valid fields >>>> * DefaultBodyDescriptor parses header fields such as Content-Type >>>> * MaximalBodyDescriptor parses additional header fields such as >>>> Mime-Version >>>> * eventually MimeStreamParser.parse retrieves the raw field from >>>> AbstractEntity and notifies a ContentHandler >>>> * the ContentHandler (again) has to parse the raw string into name and >>>> value >>>> * the ContentHandler (again) has to parse certain structural fields >>>> already parsed by a body descriptor >>> >>> IIRC the descriptor stuff was added to satisfy requirements for SAX >>> and pull parsing downstream (in particular IMAP which uses a lot of >>> MIME meta-data) >>> >>>> In case of building a DOM the latter two items are done by >>>> MessageBuilder by calling Field.parse(). >>>> >>>> There are several issues here: >>>> * the ContentHandler has to do a lot of work that has already been >>>> done before by AbstractEntity. >>> >>> well, probably done before (dependent of parser settings) >>> >>>> * Field.parse() uses a JavaCC-generated parser whereas the body >>>> descriptors have their own parsing code. >>> >>> yes - the body descriptors support a wider variety of fields than the >>> Field stuff >>> >>>> * we have unnecessary duplicate code and the two parsers are very >>>> likely inconsistent with each other. >>> >>> yes - i trust the code in AbstractEntity more highly then that in Field >> >> I'm not so sure.. For example AbstractEntity uses >> MimeUtil.getHeaderParams() to parse Content-Type and >> Content-Disposition. It looks like getHeaderParams does not correctly >> handle comments that are allowed in quoted-strings.. Consider this >> example: >> >> String test = "text/plain; key=value; foo= (test) \"bar\""; >> Map<String, String> headerParams = MimeUtil.getHeaderParams(test); >> System.out.println(headerParams); >> >> The result is {=text/plain, foo=(test), key=value} which is wrong. The >> Field equivalent.. >> >> ContentTypeField f = Fields.contentType(test); >> System.out.println(f.getParameters()); >> >> ..gives the correct result: {foo=bar, key=value}. >> >> But of course we could fix this and use MimeUtil.getHeaderParams() for >> parsing Fields, too. In fact I would not mind if most of the javacc >> parsers were replaced by handcrafted equivalents.. >> > > Folks > > For what is it worth to you. HttpComponents Core provides an extensive > support for parsing and formatting of MIME headers. The code has been around > for quite some time, has been relatively well tested, has been extensively > optimized for performance and memory footprint and is able to operate with > virtually zero intermediate garbage and no intermediate memory copying. > > The parsing/formatting framework is based on the CharArrayBuffer class, > which is used by MimeTokenStream for low level parsing operations. > > http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/HeaderValueParser.java > http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/BasicHeaderValueParser.java > http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/HeaderValueFormatter.java > http://svn.apache.org/repos/asf/httpcomponents/httpcore/trunk/module-main/src/main/java/org/apache/http/message/BasicHeaderValueFormatter.java > > Feel free to take a look at the code and see if it might make sense > replacing MimeUtil.getHeaderParams(value) with some bits from HC.
sounds good - it's good to see that we have quite a few options for fix this properly in 0.7 - robert
