Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

robert engels Mon, 13 Jan 2025 15:44:46 -0800

You wouldn’t get an eof if the data is properly encoded. Not sure what the 
problem is.


You need to be doing something with the Reader - most likely writing to a file, 
streaming to a database record, etc.

I would simplify the code to a single test case that demonstrates the issue you 
are having with the code.

> On Jan 13, 2025, at 5:34 PM, Rory Campbell-Lange <r...@campbell-lange.net> 
> wrote:
> 
> I'm just doing the reverse of that, I think, by removing the padding.
> 
> I can't seem to trigger an EOF with this code below:
> 
>>>       n, err = b.br.Read(h)
>>>       if err != nil {
>>>           return n, err
>>>       }
> 
> 
> On 13/01/25, robert engels (reng...@ix.netcom.com 
> <mailto:reng...@ix.netcom.com>) wrote:
>> As has been pointing out, you don’t need to read the whole thing into 
>> memory, just wrap the data provider with one that adds the padding it 
>> doesn’t exist - and always read with the padded decoder.
>> 
>> To add the padding you only need to keep track of the count of characters 
>> read before eof to determine how many padding characters to synthetically 
>> add - if the original data is padding this will be 0 (if it was padded 
>> correctly).
>> 
>>> On Jan 13, 2025, at 4:42 PM, Rory Campbell-Lange <r...@campbell-lange.net> 
>>> wrote:
>>> 
>>> AS I wrote earlier, I'm trying to avoid reading the entire email part into 
>>> memory to discover if I should use base64.StdEncoding or 
>>> base64.RawStdEncoding.
>>> 
>>> The following seems to work reasonably well:
>>> 
>>>   type B64Translator struct {
>>>       br *bufio.Reader
>>>   }
>>> 
>>>   func NewB64Translator(r io.Reader) *B64Translator {
>>>       return &B64Translator{
>>>           br: bufio.NewReader(r),
>>>       }
>>>   }
>>> 
>>>   // Read reads off the buffered reader expecting base64.StdEncoding bytes
>>>   // with (potentially) 1-3 '=' padding characters at the end.
>>>   // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
>>>   // if the padding is removed.
>>>   func (b *B64Translator) Read(p []byte) (n int, err error) {
>>>       h := make([]byte, len(p))
>>>       n, err = b.br.Read(h)
>>>       if err != nil {
>>>           return n, err
>>>       }
>>>       // to be optimised
>>>       c := bytes.Count(h, []byte("="))
>>>       copy(p, h[:n-c])
>>>       // fmt.Println(string(h), n, string(p), n-c)
>>>       return n - c, nil
>>>   }
>>> 
>>> https://go.dev/play/p/H6ii7Vy-8as
>>> 
>>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
>>> the output), eg:
>>> 
>>>     --
>>>                raw: Bonjour joyeux lion
>>>                             Qm9uam91ciwgam95ZXV4IGxpb24K
>>>                     ok: false
>>>        decoded: Bonjour, joyeux lion* <-------------------- e.g. here
>>>     --
>>>                std: "Bonjour, joyeux lion"
>>>                             IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
>>>                     ok: true
>>>        decoded: "Bonjour, joyeux lion"
>>>     --
>>> 
>>> Any thoughts on that would be gratefully received. 
>>> 
>>> Rory
>>> 
>>> 
>>> On 13/01/25, Rory Campbell-Lange (r...@campbell-lange.net 
>>> <mailto:r...@campbell-lange.net> <mailto:r...@campbell-lange.net>) wrote:
>>>> Thanks very much for the playground link and thoughts.
>>>> 
>>>> The use case is reading base64 email parts, which could be of a very large 
>>>> size. It is unclear when processing these parts if they are base64 padded 
>>>> or not.
>>>> 
>>>> I'm trying to avoid reading the entire email part into memory. 
>>>> Consequently I think your earlier idea of adding padding (or removing it) 
>>>> in a wrapper could work. Perhaps wrapping the reader with another using a 
>>>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
>>>> add padding if needed.
>>>> 
>>>> Rory
>>>> 
>>>> On 13/01/25, Axel Wagner (axel.wagner...@googlemail.com 
>>>> <mailto:axel.wagner...@googlemail.com><mailto:axel.wagner...@googlemail.com>)
>>>>  wrote:
>>>>> Just realized: If you twist the idea around, you get something easy to
>>>>> implement and more correct.
>>>>> Instead of stripping padding if it exist, you can ensure that the body 
>>>>> *is*
>>>>> padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
>>>>> You can then feed that to base64.StdEncoding. If the wrapped Reader 
>>>>> returns
>>>>> padded Base64, this does nothing. If it returns unpadded Base64, it adds
>>>>> padding. If it returns incorrect Base64, it will create a padded stream,
>>>>> that will then get rejected by the Base64 decoder.
>>>>> 
>>>>> On Mon, 13 Jan 2025 at 10:31, Axel Wagner <axel.wagner...@googlemail.com 
>>>>> <mailto:axel.wagner...@googlemail.com><mailto:axel.wagner...@googlemail.com>>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> one way to solve your problem is to wrap the body into an io.Reader that
>>>>>> strips off everything after the first `=` it finds. That can then be fed 
>>>>>> to
>>>>>> base64.RawStdEncoding. This approach requires no extra buffering or 
>>>>>> copying
>>>>>> and is easy to implement: https://go.dev/play/p/CwcVz7oietI
>>>>>> 
>>>>>> The downside is, that this will not verify that the body is *either*
>>>>>> correctly padded Base64 *or* unpadded Base64. So, it will not report an
>>>>>> error if fed something like "AAA=garbage".
>>>>>> That can be remedied by buffering up to four bytes and, when encountering
>>>>>> an EOF, check that there are at most three trailing `=` and that the 
>>>>>> total
>>>>>> length of the stream is divisible by four. It's more finicky to 
>>>>>> implement,
>>>>>> but it should also be possible without any extra copies and only 
>>>>>> requires a
>>>>>> very small extra buffer.
>>>>>> 
>>>>>> On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange 
>>>>>> <r...@campbell-lange.net 
>>>>>> <mailto:r...@campbell-lange.net><mailto:r...@campbell-lange.net>>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks very much for the links, pointers and possible solution.
>>>>>>> 
>>>>>>> Trying to read base64 standard (padded) encoded data with
>>>>>>> base64.RawStdEncoding can produce an error such as
>>>>>>> 
>>>>>>>   illegal base64 data at input byte <n>
>>>>>>> 
>>>>>>> Reading base64 raw (unpadded) encoded data produces the EOF error.
>>>>>>> 
>>>>>>> I'll go with trying to read the standard encoded data up to maybe 1MB 
>>>>>>> and
>>>>>>> then switch to base64.RawStdEncoding if I hit the "illegal base64 data"
>>>>>>> problem, maybe with reference to bufio.Reader which has most of the 
>>>>>>> methods
>>>>>>> suggested below.
>>>>>>> 
>>>>>>> Yes, the use of a "Rewind" method would be crucial. I guess this would
>>>>>>> need to:
>>>>>>> 1. error if more than one buffer of data has been read
>>>>>>> 2. else re-read from byte 0
>>>>>>> 
>>>>>>> Thanks again very much for these suggestions.
>>>>>>> 
>>>>>>> Rory
>>>>>>> 
>>>>>>> On 12/01/25, robert engels (reng...@ix.netcom.com 
>>>>>>> <mailto:reng...@ix.netcom.com> <mailto:reng...@ix.netcom.com>) wrote:
>>>>>>>> Also, see this
>>>>>>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>>>>>>> as I expected the error should be reported earlier than the end of 
>>>>>>> stream
>>>>>>> if the chosen format is wrong.
>>>>>>>> 
>>>>>>>>> On Jan 12, 2025, at 2:57 PM, robert engels <reng...@ix.netcom.com 
>>>>>>>>> <mailto:reng...@ix.netcom.com>>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Also, this is what Gemini provided which looks basically correct -
>>>>>>> but I think encapsulating it with a Rewind() method would be easier to
>>>>>>> understand.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> While Go doesn't have a built-in PushbackReader like some other
>>>>>>> languages (e.g., Java), you can implement similar functionality using a
>>>>>>> custom struct and a buffer.
>>>>>>>>> 
>>>>>>>>> Here's an example implementation:
>>>>>>>>> 
>>>>>>>>> package main
>>>>>>>>> 
>>>>>>>>> import (
>>>>>>>>>   "bytes"
>>>>>>>>>   "io"
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> type PushbackReader struct {
>>>>>>>>>   reader io.Reader
>>>>>>>>>   buffer *bytes.Buffer
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> func NewPushbackReader(r io.Reader) *PushbackReader {
>>>>>>>>>   return &PushbackReader{
>>>>>>>>>       reader: r,
>>>>>>>>>       buffer: new(bytes.Buffer),
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> func (p *PushbackReader) Read(b []byte) (n int, err error) {
>>>>>>>>>   if p.buffer.Len() > 0 {
>>>>>>>>>       return p.buffer.Read(b)
>>>>>>>>>   }
>>>>>>>>>   return p.reader.Read(b)
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> func (p *PushbackReader) UnreadByte() error {
>>>>>>>>>   if p.buffer.Len() == 0 {
>>>>>>>>>       return io.EOF
>>>>>>>>>   }
>>>>>>>>>   lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
>>>>>>>>>   p.buffer.Truncate(p.buffer.Len() - 1)
>>>>>>>>>   p.buffer.WriteByte(lastByte)
>>>>>>>>>   return nil
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> func (p *PushbackReader) Unread(buf []byte) error {
>>>>>>>>>   if p.buffer.Len() == 0 {
>>>>>>>>>       return io.EOF
>>>>>>>>>   }
>>>>>>>>>   p.buffer.Write(buf)
>>>>>>>>>   return nil
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> func main() {
>>>>>>>>>   // Example usage
>>>>>>>>>   r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
>>>>>>>>>   buf := make([]byte, 5)
>>>>>>>>>   r.Read(buf)
>>>>>>>>>   r.UnreadByte()
>>>>>>>>>   r.Read(buf)
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> Explanation:
>>>>>>>>> PushbackReader struct: This struct holds the underlying io.Reader and
>>>>>>> a buffer to store the pushed-back bytes.
>>>>>>>>> NewPushbackReader: This function creates a new PushbackReader from an
>>>>>>> existing io.Reader.
>>>>>>>>> Read method: This method reads bytes from either the buffer (if it
>>>>>>> contains data) or the underlying reader.
>>>>>>>>> UnreadByte method: This method pushes back a single byte into the
>>>>>>> buffer.
>>>>>>>>> Unread method: This method pushes back a slice of bytes into the
>>>>>>> buffer.
>>>>>>>>> Important Considerations:
>>>>>>>>> The buffer size is not managed automatically. You may need to adjust
>>>>>>> the buffer size based on your use case.
>>>>>>>>> This implementation does not handle pushing back beyond the initially
>>>>>>> read data. If you need to support arbitrary pushback, you'll need a more
>>>>>>> complex solution.
>>>>>>>>> 
>>>>>>>>> Generative AI is experimental.
>>>>>>>>> 
>>>>>>>>>> On Jan 12, 2025, at 2:53 PM, Robert Engels <reng...@ix.netcom.com 
>>>>>>>>>> <mailto:reng...@ix.netcom.com>>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> You can see the two pass reader here
>>>>>>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
>>>>>>>>>> 
>>>>>>>>>> But yea, the basic premise is that you buffer the data so you can
>>>>>>> rewind if needed
>>>>>>>>>> 
>>>>>>>>>> Are you certain it is reading to the end to return EOF? It may be
>>>>>>> returning eof once the parsing fails.
>>>>>>>>>> 
>>>>>>>>>> Otherwise I would expect this is being decoded wrong - eg the mime
>>>>>>> type or encoding type should tell you the correct format before you 
>>>>>>> start
>>>>>>> decoding.
>>>>>>>>>> 
>>>>>>>>>>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange <
>>>>>>> r...@campbell-lange.net <mailto:r...@campbell-lange.net>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the suggestion of a ReadSeeker to wrap an io.Reader.
>>>>>>>>>>> 
>>>>>>>>>>> My google fu must be deserting me. I can find PushbackReader
>>>>>>> implementations in Java, but the only similar thing for Go I could find 
>>>>>>> was
>>>>>>> https://gitlab.com/osaki-lab/iowrapper. If you have a specific
>>>>>>> recommendation for a ReadSeeker wrapper to an io.Reader that would be 
>>>>>>> great
>>>>>>> to know.
>>>>>>>>>>> 
>>>>>>>>>>> Since the base64 decoding error I'm looking for is an EOF, I guess
>>>>>>> the wrapper approach will not work when the EOF byte position is > than 
>>>>>>> the
>>>>>>> io.ReadSeeker buffer size.
>>>>>>>>>>> 
>>>>>>>>>>> Rory
>>>>>>>>>>> 
>>>>>>>>>>> On 12/01/25, robert engels (reng...@ix.netcom.com 
>>>>>>>>>>> <mailto:reng...@ix.netcom.com>) wrote:
>>>>>>>>>>>> create a ReadSeeker that wraps the Reader providing the buffering
>>>>>>> (mark & reset) - normally the buffer only needs to be large enough to
>>>>>>> detect the format contained in the Reader.
>>>>>>>>>>>> 
>>>>>>>>>>>> You can search Google for PushbackReader in Go and you’ll get a
>>>>>>> basic implementation.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange <
>>>>>>> r...@campbell-lange.net <mailto:r...@campbell-lange.net>> wrote:
>>>>>>>>>>> ...
>>>>>>>>>>>>> I'm attempting to rationalise the process [of avoiding reading
>>>>>>> email parts into byte slices] by simply wrapping the provided io.Reader
>>>>>>> with the necessary decoders to reduce memory usage and unnecessary
>>>>>>> processing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The wrapping strategy seems to work ok. However there is a
>>>>>>> particular issue in detecting base64.StdEncoding versus
>>>>>>> base64.RawStdEncoding, which requires draining the io.Reader using
>>>>>>> base64.StdEncoding and (based on the current implementation) switching 
>>>>>>> to
>>>>>>> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found.
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "golang-nuts" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> <mailto:
>>>>>>> golang-nuts+unsubscr...@googlegroups.com 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>>.
>>>>>>>>>> To view this discussion visit
>>>>>>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com
>>>>>>> <
>>>>>>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer
>>>>>>>> .
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups
>>>>>>> "golang-nuts" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>> an
>>>>>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> 
>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>>>>> To view this discussion visit
>>>>>>> https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net
>>>>>>> .
>>>>>>> 
>>>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "golang-nuts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> 
>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>> To view this discussion visit 
>>>> https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>> <mailto:golang-nuts+unsubscr...@googlegroups.com> 
>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>> To view this discussion visit 
>>> https://groups.google.com/d/msgid/golang-nuts/Z4WW2goeTO5Vz5Lc%40campbell-lange.net.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/9F6FCA2F-9641-41F5-AB0F-42055287BB85%40ix.netcom.com.

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

Reply via email to