I'm just doing the reverse of that, I think, by removing the padding. I can't seem to trigger an EOF with this code below:
> > n, err = b.br.Read(h) > > if err != nil { > > return n, err > > } On 13/01/25, robert engels (reng...@ix.netcom.com) wrote: > As has been pointing out, you don’t need to read the whole thing into memory, > just wrap the data provider with one that adds the padding it doesn’t exist - > and always read with the padded decoder. > > To add the padding you only need to keep track of the count of characters > read before eof to determine how many padding characters to synthetically add > - if the original data is padding this will be 0 (if it was padded correctly). > > > On Jan 13, 2025, at 4:42 PM, Rory Campbell-Lange <r...@campbell-lange.net> > > wrote: > > > > AS I wrote earlier, I'm trying to avoid reading the entire email part into > > memory to discover if I should use base64.StdEncoding or > > base64.RawStdEncoding. > > > > The following seems to work reasonably well: > > > > type B64Translator struct { > > br *bufio.Reader > > } > > > > func NewB64Translator(r io.Reader) *B64Translator { > > return &B64Translator{ > > br: bufio.NewReader(r), > > } > > } > > > > // Read reads off the buffered reader expecting base64.StdEncoding bytes > > // with (potentially) 1-3 '=' padding characters at the end. > > // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data > > // if the padding is removed. > > func (b *B64Translator) Read(p []byte) (n int, err error) { > > h := make([]byte, len(p)) > > n, err = b.br.Read(h) > > if err != nil { > > return n, err > > } > > // to be optimised > > c := bytes.Count(h, []byte("=")) > > copy(p, h[:n-c]) > > // fmt.Println(string(h), n, string(p), n-c) > > return n - c, nil > > } > > > > https://go.dev/play/p/H6ii7Vy-8as > > > > One odd thing is that I'm getting extraneous newlines (shown by stars in > > the output), eg: > > > > -- > > raw: Bonjour joyeux lion > > Qm9uam91ciwgam95ZXV4IGxpb24K > > ok: false > > decoded: Bonjour, joyeux lion* <-------------------- e.g. here > > -- > > std: "Bonjour, joyeux lion" > > IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== > > ok: true > > decoded: "Bonjour, joyeux lion" > > -- > > > > Any thoughts on that would be gratefully received. > > > > Rory > > > > > > On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net > > <mailto:r...@campbell-lange.net>) wrote: > >> Thanks very much for the playground link and thoughts. > >> > >> The use case is reading base64 email parts, which could be of a very large > >> size. It is unclear when processing these parts if they are base64 padded > >> or not. > >> > >> I'm trying to avoid reading the entire email part into memory. > >> Consequently I think your earlier idea of adding padding (or removing it) > >> in a wrapper could work. Perhaps wrapping the reader with another using a > >> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could > >> add padding if needed. > >> > >> Rory > >> > >> On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com > >> <mailto:axel.wagner...@googlemail.com>) wrote: > >>> Just realized: If you twist the idea around, you get something easy to > >>> implement and more correct. > >>> Instead of stripping padding if it exist, you can ensure that the body > >>> *is* > >>> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS > >>> You can then feed that to base64.StdEncoding. If the wrapped Reader > >>> returns > >>> padded Base64, this does nothing. If it returns unpadded Base64, it adds > >>> padding. If it returns incorrect Base64, it will create a padded stream, > >>> that will then get rejected by the Base64 decoder. > >>> > >>> On Mon, 13 Jan 2025 at 10:31, Axel Wagner <axel.wagner...@googlemail.com > >>> <mailto:axel.wagner...@googlemail.com>> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> one way to solve your problem is to wrap the body into an io.Reader that > >>>> strips off everything after the first `=` it finds. That can then be fed > >>>> to > >>>> base64.RawStdEncoding. This approach requires no extra buffering or > >>>> copying > >>>> and is easy to implement: https://go.dev/play/p/CwcVz7oietI > >>>> > >>>> The downside is, that this will not verify that the body is *either* > >>>> correctly padded Base64 *or* unpadded Base64. So, it will not report an > >>>> error if fed something like "AAA=garbage". > >>>> That can be remedied by buffering up to four bytes and, when encountering > >>>> an EOF, check that there are at most three trailing `=` and that the > >>>> total > >>>> length of the stream is divisible by four. It's more finicky to > >>>> implement, > >>>> but it should also be possible without any extra copies and only > >>>> requires a > >>>> very small extra buffer. > >>>> > >>>> On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange > >>>> <r...@campbell-lange.net <mailto:r...@campbell-lange.net>> > >>>> wrote: > >>>> > >>>>> Thanks very much for the links, pointers and possible solution. > >>>>> > >>>>> Trying to read base64 standard (padded) encoded data with > >>>>> base64.RawStdEncoding can produce an error such as > >>>>> > >>>>> illegal base64 data at input byte <n> > >>>>> > >>>>> Reading base64 raw (unpadded) encoded data produces the EOF error. > >>>>> > >>>>> I'll go with trying to read the standard encoded data up to maybe 1MB > >>>>> and > >>>>> then switch to base64.RawStdEncoding if I hit the "illegal base64 data" > >>>>> problem, maybe with reference to bufio.Reader which has most of the > >>>>> methods > >>>>> suggested below. > >>>>> > >>>>> Yes, the use of a "Rewind" method would be crucial. I guess this would > >>>>> need to: > >>>>> 1. error if more than one buffer of data has been read > >>>>> 2. else re-read from byte 0 > >>>>> > >>>>> Thanks again very much for these suggestions. > >>>>> > >>>>> Rory > >>>>> > >>>>> On 12/01/25, robert engels (reng...@ix.netcom.com > >>>>> <mailto:reng...@ix.netcom.com>) wrote: > >>>>>> Also, see this > >>>>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go > >>>>> as I expected the error should be reported earlier than the end of > >>>>> stream > >>>>> if the chosen format is wrong. > >>>>>> > >>>>>>> On Jan 12, 2025, at 2:57 PM, robert engels <reng...@ix.netcom.com> > >>>>> wrote: > >>>>>>> > >>>>>>> Also, this is what Gemini provided which looks basically correct - > >>>>> but I think encapsulating it with a Rewind() method would be easier to > >>>>> understand. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> While Go doesn't have a built-in PushbackReader like some other > >>>>> languages (e.g., Java), you can implement similar functionality using a > >>>>> custom struct and a buffer. > >>>>>>> > >>>>>>> Here's an example implementation: > >>>>>>> > >>>>>>> package main > >>>>>>> > >>>>>>> import ( > >>>>>>> "bytes" > >>>>>>> "io" > >>>>>>> ) > >>>>>>> > >>>>>>> type PushbackReader struct { > >>>>>>> reader io.Reader > >>>>>>> buffer *bytes.Buffer > >>>>>>> } > >>>>>>> > >>>>>>> func NewPushbackReader(r io.Reader) *PushbackReader { > >>>>>>> return &PushbackReader{ > >>>>>>> reader: r, > >>>>>>> buffer: new(bytes.Buffer), > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> func (p *PushbackReader) Read(b []byte) (n int, err error) { > >>>>>>> if p.buffer.Len() > 0 { > >>>>>>> return p.buffer.Read(b) > >>>>>>> } > >>>>>>> return p.reader.Read(b) > >>>>>>> } > >>>>>>> > >>>>>>> func (p *PushbackReader) UnreadByte() error { > >>>>>>> if p.buffer.Len() == 0 { > >>>>>>> return io.EOF > >>>>>>> } > >>>>>>> lastByte := p.buffer.Bytes()[p.buffer.Len()-1] > >>>>>>> p.buffer.Truncate(p.buffer.Len() - 1) > >>>>>>> p.buffer.WriteByte(lastByte) > >>>>>>> return nil > >>>>>>> } > >>>>>>> > >>>>>>> func (p *PushbackReader) Unread(buf []byte) error { > >>>>>>> if p.buffer.Len() == 0 { > >>>>>>> return io.EOF > >>>>>>> } > >>>>>>> p.buffer.Write(buf) > >>>>>>> return nil > >>>>>>> } > >>>>>>> > >>>>>>> func main() { > >>>>>>> // Example usage > >>>>>>> r := NewPushbackReader(bytes.NewBufferString("Hello, World!")) > >>>>>>> buf := make([]byte, 5) > >>>>>>> r.Read(buf) > >>>>>>> r.UnreadByte() > >>>>>>> r.Read(buf) > >>>>>>> } > >>>>>>> > >>>>>>> Explanation: > >>>>>>> PushbackReader struct: This struct holds the underlying io.Reader and > >>>>> a buffer to store the pushed-back bytes. > >>>>>>> NewPushbackReader: This function creates a new PushbackReader from an > >>>>> existing io.Reader. > >>>>>>> Read method: This method reads bytes from either the buffer (if it > >>>>> contains data) or the underlying reader. > >>>>>>> UnreadByte method: This method pushes back a single byte into the > >>>>> buffer. > >>>>>>> Unread method: This method pushes back a slice of bytes into the > >>>>> buffer. > >>>>>>> Important Considerations: > >>>>>>> The buffer size is not managed automatically. You may need to adjust > >>>>> the buffer size based on your use case. > >>>>>>> This implementation does not handle pushing back beyond the initially > >>>>> read data. If you need to support arbitrary pushback, you'll need a more > >>>>> complex solution. > >>>>>>> > >>>>>>> Generative AI is experimental. > >>>>>>> > >>>>>>>> On Jan 12, 2025, at 2:53 PM, Robert Engels <reng...@ix.netcom.com> > >>>>> wrote: > >>>>>>>> > >>>>>>>> You can see the two pass reader here > >>>>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go > >>>>>>>> > >>>>>>>> But yea, the basic premise is that you buffer the data so you can > >>>>> rewind if needed > >>>>>>>> > >>>>>>>> Are you certain it is reading to the end to return EOF? It may be > >>>>> returning eof once the parsing fails. > >>>>>>>> > >>>>>>>> Otherwise I would expect this is being decoded wrong - eg the mime > >>>>> type or encoding type should tell you the correct format before you > >>>>> start > >>>>> decoding. > >>>>>>>> > >>>>>>>>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange < > >>>>> r...@campbell-lange.net> wrote: > >>>>>>>>> > >>>>>>>>> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader. > >>>>>>>>> > >>>>>>>>> My google fu must be deserting me. I can find PushbackReader > >>>>> implementations in Java, but the only similar thing for Go I could find > >>>>> was > >>>>> https://gitlab.com/osaki-lab/iowrapper. If you have a specific > >>>>> recommendation for a ReadSeeker wrapper to an io.Reader that would be > >>>>> great > >>>>> to know. > >>>>>>>>> > >>>>>>>>> Since the base64 decoding error I'm looking for is an EOF, I guess > >>>>> the wrapper approach will not work when the EOF byte position is > than > >>>>> the > >>>>> io.ReadSeeker buffer size. > >>>>>>>>> > >>>>>>>>> Rory > >>>>>>>>> > >>>>>>>>> On 12/01/25, robert engels (reng...@ix.netcom.com) wrote: > >>>>>>>>>> create a ReadSeeker that wraps the Reader providing the buffering > >>>>> (mark & reset) - normally the buffer only needs to be large enough to > >>>>> detect the format contained in the Reader. > >>>>>>>>>> > >>>>>>>>>> You can search Google for PushbackReader in Go and you’ll get a > >>>>> basic implementation. > >>>>>>>>>> > >>>>>>>>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange < > >>>>> r...@campbell-lange.net> wrote: > >>>>>>>>> ... > >>>>>>>>>>> I'm attempting to rationalise the process [of avoiding reading > >>>>> email parts into byte slices] by simply wrapping the provided io.Reader > >>>>> with the necessary decoders to reduce memory usage and unnecessary > >>>>> processing. > >>>>>>>>>>> > >>>>>>>>>>> The wrapping strategy seems to work ok. However there is a > >>>>> particular issue in detecting base64.StdEncoding versus > >>>>> base64.RawStdEncoding, which requires draining the io.Reader using > >>>>> base64.StdEncoding and (based on the current implementation) switching > >>>>> to > >>>>> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found. > >>>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> You received this message because you are subscribed to the Google > >>>>> Groups "golang-nuts" group. > >>>>>>>> To unsubscribe from this group and stop receiving emails from it, > >>>>> send an email to golang-nuts+unsubscr...@googlegroups.com > >>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> <mailto: > >>>>> golang-nuts+unsubscr...@googlegroups.com > >>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>>. > >>>>>>>> To view this discussion visit > >>>>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com > >>>>> < > >>>>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer > >>>>>> . > >>>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > >>>>> Groups > >>>>> "golang-nuts" group. > >>>>> To unsubscribe from this group and stop receiving emails from it, send > >>>>> an > >>>>> email to golang-nuts+unsubscr...@googlegroups.com > >>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. > >>>>> To view this discussion visit > >>>>> https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net > >>>>> . > >>>>> > >>>> > >> > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "golang-nuts" group. > >> To unsubscribe from this group and stop receiving emails from it, send an > >> email to golang-nuts+unsubscr...@googlegroups.com > >> <mailto:golang-nuts+unsubscr...@googlegroups.com>. > >> To view this discussion visit > >> https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net. > > > > -- > > You received this message because you are subscribed to the Google Groups > > "golang-nuts" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to golang-nuts+unsubscr...@googlegroups.com > > <mailto:golang-nuts+unsubscr...@googlegroups.com>. > > To view this discussion visit > > https://groups.google.com/d/msgid/golang-nuts/Z4WW2goeTO5Vz5Lc%40campbell-lange.net. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/Z4WjAeHQBLOYMu2J%40campbell-lange.net.