On Sb 11 feb 2012 00:15:57 +0200, Markus Wiederkehr wrote: > Hi Ioan, > > Mime4j's BufferedLineReaderInputStream bridges the gap between byte and > character streams. It lets you read lines of text from a byte stream into a > ByteArrayBuffer. Then you can use class ContentUtil to decode the > ByteArrayBuffer into a String. You can also push back (unread) content. > Maybe that helps with your project. > > Cheers, > Markus
Thanks for clarifying Markus. The only thing I'm not sure of right now is whether the mbox file has one charset. It should be, because multi-charset text files are kind of weird and would be very problematic (and I never heard of before). But I am uncertain because messages can have an encoding specified with Content-encoding header. >From what you said, mime4j uses a charset per message because it doesn't assume that all messages are part of a single file with one encoding. I will update the code to provide for means of creating an iterator for which you can specify: - file charset - From_ line regex - sensible defaults otherwise. After this, I'll find a place to plug it in mime4j. Thanks, -- Ioan Eugen Stan
