I was more or less right. The input string, which you encoded to "Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end. It's not spurious.
Confirmed by the "echo" pipeline I gave above, or in Go itself: https://go.dev/play/p/6kSxiCfCTo4 You can also confirm it by multiplying the length of the input by 3/4 % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c 28 28*3/4 = 21 B o n j o u r , _ j o y e u x _ l i o n \n On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote: > Sorry ignore that, I hadn't checked your playground link. > > On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote: > >> > AS I wrote earlier, I'm trying to avoid reading the entire email part >> into memory to discover if I should use base64.StdEncoding or >> base64.RawStdEncoding. >> >> As I asked before, why would you ever need to use RawStdEncoding? It just >> means the MIME part was invalid, most likely corrupted/truncated. >> >> > One odd thing is that I'm getting extraneous newlines (shown by stars >> in the output), eg: >> >> You are feeding two different inputs which do not differ by truncation >> alone. >> >> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c >> 0000000 B o n j o u r , j o y e u x >> 0000010 l i o n \n >> 0000015 >> >> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c >> 0000000 " B o n j o u r , j o y e u x >> 0000010 l i o n " >> 0000016 >> >> The second one has encoded double-quotes before and after the content. >> >> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote: >> >>> AS I wrote earlier, I'm trying to avoid reading the entire email part >>> into memory to discover if I should use base64.StdEncoding or >>> base64.RawStdEncoding. >>> >>> The following seems to work reasonably well: >>> >>> type B64Translator struct { >>> br *bufio.Reader >>> } >>> >>> func NewB64Translator(r io.Reader) *B64Translator { >>> return &B64Translator{ >>> br: bufio.NewReader(r), >>> } >>> } >>> >>> // Read reads off the buffered reader expecting base64.StdEncoding bytes >>> // with (potentially) 1-3 '=' padding characters at the end. >>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data >>> // if the padding is removed. >>> func (b *B64Translator) Read(p []byte) (n int, err error) { >>> h := make([]byte, len(p)) >>> n, err = b.br.Read(h) >>> if err != nil { >>> return n, err >>> } >>> // to be optimised >>> c := bytes.Count(h, []byte("=")) >>> copy(p, h[:n-c]) >>> // fmt.Println(string(h), n, string(p), n-c) >>> return n - c, nil >>> } >>> >>> https://go.dev/play/p/H6ii7Vy-8as >>> >>> One odd thing is that I'm getting extraneous newlines (shown by stars in >>> the output), eg: >>> >>> -- >>> raw: Bonjour joyeux lion >>> Qm9uam91ciwgam95ZXV4IGxpb24K >>> ok: false >>> decoded: Bonjour, joyeux lion* <-------------------- e.g. here >>> -- >>> std: "Bonjour, joyeux lion" >>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== >>> ok: true >>> decoded: "Bonjour, joyeux lion" >>> -- >>> >>> Any thoughts on that would be gratefully received. >>> >>> Rory >>> >>> >>> On 13/01/25, Rory Campbell-Lange (ro...@campbell-lange.net) wrote: >>> > Thanks very much for the playground link and thoughts. >>> > >>> > The use case is reading base64 email parts, which could be of a very >>> large size. It is unclear when processing these parts if they are base64 >>> padded or not. >>> > >>> > I'm trying to avoid reading the entire email part into memory. >>> Consequently I think your earlier idea of adding padding (or removing it) >>> in a wrapper could work. Perhaps wrapping the reader with another using a >>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could >>> add padding if needed. >>> > >>> > Rory >>> > >>> > On 13/01/25, Axel Wagner (axel.wa...@googlemail.com) wrote: >>> > > Just realized: If you twist the idea around, you get something easy >>> to >>> > > implement and more correct. >>> > > Instead of stripping padding if it exist, you can ensure that the >>> body *is* >>> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS >>> > > You can then feed that to base64.StdEncoding. If the wrapped Reader >>> returns >>> > > padded Base64, this does nothing. If it returns unpadded Base64, it >>> adds >>> > > padding. If it returns incorrect Base64, it will create a padded >>> stream, >>> > > that will then get rejected by the Base64 decoder. >>> > > >>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner <axel.wa...@googlemail.com> >>> >>> > > wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > one way to solve your problem is to wrap the body into an >>> io.Reader that >>> > > > strips off everything after the first `=` it finds. That can then >>> be fed to >>> > > > base64.RawStdEncoding. This approach requires no extra buffering >>> or copying >>> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI >>> > > > >>> > > > The downside is, that this will not verify that the body is >>> *either* >>> > > > correctly padded Base64 *or* unpadded Base64. So, it will not >>> report an >>> > > > error if fed something like "AAA=garbage". >>> > > > That can be remedied by buffering up to four bytes and, when >>> encountering >>> > > > an EOF, check that there are at most three trailing `=` and that >>> the total >>> > > > length of the stream is divisible by four. It's more finicky to >>> implement, >>> > > > but it should also be possible without any extra copies and only >>> requires a >>> > > > very small extra buffer. >>> > > > >>> > > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange < >>> ro...@campbell-lange.net> >>> > > > wrote: >>> > > > >>> > > >> Thanks very much for the links, pointers and possible solution. >>> > > >> >>> > > >> Trying to read base64 standard (padded) encoded data with >>> > > >> base64.RawStdEncoding can produce an error such as >>> > > >> >>> > > >> illegal base64 data at input byte <n> >>> > > >> >>> > > >> Reading base64 raw (unpadded) encoded data produces the EOF >>> error. >>> > > >> >>> > > >> I'll go with trying to read the standard encoded data up to maybe >>> 1MB and >>> > > >> then switch to base64.RawStdEncoding if I hit the "illegal base64 >>> data" >>> > > >> problem, maybe with reference to bufio.Reader which has most of >>> the methods >>> > > >> suggested below. >>> > > >> >>> > > >> Yes, the use of a "Rewind" method would be crucial. I guess this >>> would >>> > > >> need to: >>> > > >> 1. error if more than one buffer of data has been read >>> > > >> 2. else re-read from byte 0 >>> > > >> >>> > > >> Thanks again very much for these suggestions. >>> > > >> >>> > > >> Rory >>> > > >> >>> > > >> On 12/01/25, robert engels (ren...@ix.netcom.com) wrote: >>> > > >> > Also, see this >>> > > >> >>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go >>> >>> > > >> as I expected the error should be reported earlier than the end >>> of stream >>> > > >> if the chosen format is wrong. >>> > > >> > >>> > > >> > > On Jan 12, 2025, at 2:57 PM, robert engels < >>> ren...@ix.netcom.com> >>> > > >> wrote: >>> > > >> > > >>> > > >> > > Also, this is what Gemini provided which looks basically >>> correct - >>> > > >> but I think encapsulating it with a Rewind() method would be >>> easier to >>> > > >> understand. >>> > > >> > > >>> > > >> > > >>> > > >> > > >>> > > >> > > While Go doesn't have a built-in PushbackReader like some >>> other >>> > > >> languages (e.g., Java), you can implement similar functionality >>> using a >>> > > >> custom struct and a buffer. >>> > > >> > > >>> > > >> > > Here's an example implementation: >>> > > >> > > >>> > > >> > > package main >>> > > >> > > >>> > > >> > > import ( >>> > > >> > > "bytes" >>> > > >> > > "io" >>> > > >> > > ) >>> > > >> > > >>> > > >> > > type PushbackReader struct { >>> > > >> > > reader io.Reader >>> > > >> > > buffer *bytes.Buffer >>> > > >> > > } >>> > > >> > > >>> > > >> > > func NewPushbackReader(r io.Reader) *PushbackReader { >>> > > >> > > return &PushbackReader{ >>> > > >> > > reader: r, >>> > > >> > > buffer: new(bytes.Buffer), >>> > > >> > > } >>> > > >> > > } >>> > > >> > > >>> > > >> > > func (p *PushbackReader) Read(b []byte) (n int, err error) { >>> > > >> > > if p.buffer.Len() > 0 { >>> > > >> > > return p.buffer.Read(b) >>> > > >> > > } >>> > > >> > > return p.reader.Read(b) >>> > > >> > > } >>> > > >> > > >>> > > >> > > func (p *PushbackReader) UnreadByte() error { >>> > > >> > > if p.buffer.Len() == 0 { >>> > > >> > > return io.EOF >>> > > >> > > } >>> > > >> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1] >>> > > >> > > p.buffer.Truncate(p.buffer.Len() - 1) >>> > > >> > > p.buffer.WriteByte(lastByte) >>> > > >> > > return nil >>> > > >> > > } >>> > > >> > > >>> > > >> > > func (p *PushbackReader) Unread(buf []byte) error { >>> > > >> > > if p.buffer.Len() == 0 { >>> > > >> > > return io.EOF >>> > > >> > > } >>> > > >> > > p.buffer.Write(buf) >>> > > >> > > return nil >>> > > >> > > } >>> > > >> > > >>> > > >> > > func main() { >>> > > >> > > // Example usage >>> > > >> > > r := NewPushbackReader(bytes.NewBufferString("Hello, >>> World!")) >>> > > >> > > buf := make([]byte, 5) >>> > > >> > > r.Read(buf) >>> > > >> > > r.UnreadByte() >>> > > >> > > r.Read(buf) >>> > > >> > > } >>> > > >> > > >>> > > >> > > Explanation: >>> > > >> > > PushbackReader struct: This struct holds the underlying >>> io.Reader and >>> > > >> a buffer to store the pushed-back bytes. >>> > > >> > > NewPushbackReader: This function creates a new PushbackReader >>> from an >>> > > >> existing io.Reader. >>> > > >> > > Read method: This method reads bytes from either the buffer >>> (if it >>> > > >> contains data) or the underlying reader. >>> > > >> > > UnreadByte method: This method pushes back a single byte into >>> the >>> > > >> buffer. >>> > > >> > > Unread method: This method pushes back a slice of bytes into >>> the >>> > > >> buffer. >>> > > >> > > Important Considerations: >>> > > >> > > The buffer size is not managed automatically. You may need to >>> adjust >>> > > >> the buffer size based on your use case. >>> > > >> > > This implementation does not handle pushing back beyond the >>> initially >>> > > >> read data. If you need to support arbitrary pushback, you'll need >>> a more >>> > > >> complex solution. >>> > > >> > > >>> > > >> > > Generative AI is experimental. >>> > > >> > > >>> > > >> > >> On Jan 12, 2025, at 2:53 PM, Robert Engels < >>> ren...@ix.netcom.com> >>> > > >> wrote: >>> > > >> > >> >>> > > >> > >> You can see the two pass reader here >>> > > >> >>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go >>> >>> > > >> > >> >>> > > >> > >> But yea, the basic premise is that you buffer the data so >>> you can >>> > > >> rewind if needed >>> > > >> > >> >>> > > >> > >> Are you certain it is reading to the end to return EOF? It >>> may be >>> > > >> returning eof once the parsing fails. >>> > > >> > >> >>> > > >> > >> Otherwise I would expect this is being decoded wrong - eg >>> the mime >>> > > >> type or encoding type should tell you the correct format before >>> you start >>> > > >> decoding. >>> > > >> > >> >>> > > >> > >>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange < >>> > > >> ro...@campbell-lange.net> wrote: >>> > > >> > >>> >>> > > >> > >>> Thanks for the suggestion of a ReadSeeker to wrap an >>> io.Reader. >>> > > >> > >>> >>> > > >> > >>> My google fu must be deserting me. I can find >>> PushbackReader >>> > > >> implementations in Java, but the only similar thing for Go I >>> could find was >>> > > >> https://gitlab.com/osaki-lab/iowrapper. If you have a specific >>> > > >> recommendation for a ReadSeeker wrapper to an io.Reader that >>> would be great >>> > > >> to know. >>> > > >> > >>> >>> > > >> > >>> Since the base64 decoding error I'm looking for is an EOF, >>> I guess >>> > > >> the wrapper approach will not work when the EOF byte position is >>> > than the >>> > > >> io.ReadSeeker buffer size. >>> > > >> > >>> >>> > > >> > >>> Rory >>> > > >> > >>> >>> > > >> > >>> On 12/01/25, robert engels (ren...@ix.netcom.com) wrote: >>> > > >> > >>>> create a ReadSeeker that wraps the Reader providing the >>> buffering >>> > > >> (mark & reset) - normally the buffer only needs to be large >>> enough to >>> > > >> detect the format contained in the Reader. >>> > > >> > >>>> >>> > > >> > >>>> You can search Google for PushbackReader in Go and you’ll >>> get a >>> > > >> basic implementation. >>> > > >> > >>>> >>> > > >> > >>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange < >>> > > >> ro...@campbell-lange.net> wrote: >>> > > >> > >>> ... >>> > > >> > >>>>> I'm attempting to rationalise the process [of avoiding >>> reading >>> > > >> email parts into byte slices] by simply wrapping the provided >>> io.Reader >>> > > >> with the necessary decoders to reduce memory usage and >>> unnecessary >>> > > >> processing. >>> > > >> > >>>>> >>> > > >> > >>>>> The wrapping strategy seems to work ok. However there is >>> a >>> > > >> particular issue in detecting base64.StdEncoding versus >>> > > >> base64.RawStdEncoding, which requires draining the io.Reader >>> using >>> > > >> base64.StdEncoding and (based on the current implementation) >>> switching to >>> > > >> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found. >>> > > >> > >>>>> >>> > > >> > >> >>> > > >> > >> >>> > > >> > >> -- >>> > > >> > >> You received this message because you are subscribed to the >>> Google >>> > > >> Groups "golang-nuts" group. >>> > > >> > >> To unsubscribe from this group and stop receiving emails >>> from it, >>> > > >> send an email to golang-nuts...@googlegroups.com <mailto: >>> > > >> golang-nuts...@googlegroups.com>. >>> > > >> > >> To view this discussion visit >>> > > >> >>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com >>> >>> > > >> < >>> > > >> >>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer >>> >>> > > >> >. >>> > > >> > > >>> > > >> > >>> > > >> >>> > > >> -- >>> > > >> You received this message because you are subscribed to the >>> Google Groups >>> > > >> "golang-nuts" group. >>> > > >> To unsubscribe from this group and stop receiving emails from it, >>> send an >>> > > >> email to golang-nuts...@googlegroups.com. >>> > > >> To view this discussion visit >>> > > >> >>> https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net >>> >>> > > >> . >>> > > >> >>> > > > >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts...@googlegroups.com. >>> > To view this discussion visit >>> https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net. >>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/a990ab8b-7437-45f3-a0e5-81d9b7cab4a3n%40googlegroups.com.