On Fri, May 15, 2009 at 8:54 PM, Alejandro Valdez <[email protected]> wrote: > On 5/15/09, Markus Wiederkehr <[email protected]> wrote: >> On Fri, May 15, 2009 at 12:02 AM, Alejandro Valdez >> <[email protected]> wrote: >>> Hi list, I'm using mime4j to extract the text content from the >>> e-mail's text/html parts, I >>> found that sometimes there are non-standard MIME parts that use >>> iso-8859-1 characters (i.e. >>> accented vowels) but don't declare any charset in the part's MIME header. >>> >>> In that cases I found that mime4j creates a Reader that uses us-ascii >>> as the charset (that is what >>> should be done when there is no charset declaration in the header). >>> Reading the content from that >>> Reader produces char[] with the unicode FFFD symbol in replacement of >>> the non us-ascii characters. >>> >>> Do anyone know some way to use the mime4j API to return a Reader with >>> iso-8859-1 charset set, >>> or some other solution to this (maybe common) problem? >> >> I looks indeed like this is not possible. >> >> For Mime4j 0.7 I would propose that we pull up getInputStream() from >> BinaryBody to SingleBody so that TextBody gets this method too. >> >> If that's okay I can open a JIRA and fix the issue. >> >>> This is the way I'm reading a TextPart content: >>> >>> TextBody textBody = (TextBody) part.getBody(); >>> Reader reader = textBody.getReader(); >>> char[] buffer = new char[16000]; >>> StringBuilder sb = new StringBuilder(); >>> >>> int bytesReaded = 1; >>> while (bytesReaded != -1) { >>> bytesReaded = reader.read(buffer,0,buffer.length); >>> if(bytesReaded != -1) { >>> sb.append(buffer,0,bytesReaded); >>> } >>> } >>> return sb.toString(); >> >> Looks like you want to convert the TextBody to a String.. How about this: >> >> TextBody textBody = (TextBody) part.getBody(); >> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >> textBody.writeTo(baos); >> return new String(baos.toByteArray(), "iso-8859-1"); >> >> hth >> Markus >> > > Hello Markus, thank you (very much) for your help, your snippet works > great: it creates a String with all the characters (bytes) in the MIME > TextPart. > > I'm curious about how the wirteTo() method actually works, I looked at > the mime4j 0.6 source code SingleBody.java and TextPart.java (at > src\main\java\org\apache\james\mime4j\message) but I couldn't find the > implementation of this method, please can you point me in the right > direction?
The method is implemented in StorageTextBody and StringTextBody. Your TextBody is probably an instance of StorageTextBody so this is where you want to have a look at. Cheers, Markus PS: If you work with Eclipse you can open the Type Hierarchy (F4) to figure out things like that..
