for everyone interested this is the solution in Go Playground: https://play.golang.org/p/Rxwcwhai4Gl
On Saturday, October 31, 2020 at 7:01:08 PM UTC+1 Severyn Lisovsky wrote: > Tamás Gulácsi, wow didn't know that providing bufio.Reader to > bufio.NewReader doesn't wrap your reader. Looks like this is the solution > I've been looking for. Thanks! > > On Saturday, October 31, 2020 at 6:50:18 PM UTC+1 Tamás Gulácsi wrote: > >> Why do you need an access to the internal bufio.Reader? >> >> If you provide a bufio.Reader to bufio.NewReader, then it will NOT create >> a new reader, but give back your reader. >> So if you keep your bufio.Reader, and give it to csv.NewReader, than you >> will have the same *bufio.Reader >> as what the csv.Reader's inner r ! >> >> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 18:31:34 >> UTC+1): >> >>> Tamás Gulácsi, this was basically my initial idea to do that, but >>> unfortunately there is no access to internal bufio.Reader. See: >>> https://golang.org/src/encoding/csv/reader.go#L170 >>> >>> peterGo, my file is ~100GB so downloading it just for sake of splitting >>> doesn't make sense to me. I want for each worker to make use of >>> NewRangeReader >>> method >>> <https://godoc.org/cloud.google.com/go/storage#ObjectHandle.NewRangeReader> >>> to >>> download only related piece of the file. >>> >>> ren...@ix.netcom.com ByteCount reader that wraps the underlying reader >>> wouldn't help because csv.Reader doesn't read from underlying reader >>> synchronically, it reads from bufio.Reader which buffers the bytes. So for >>> example if you read 1 row from CSV (eg. 10 bytes) from underlying io.Reader >>> will be 4096 bytes read. On the next csv.Reader.Read() call none of bytes >>> will be read from underlying io.Reader because it will take next row out of >>> the buffer >>> >>> On Saturday, October 31, 2020 at 6:02:32 PM UTC+1 Tamás Gulácsi wrote: >>> >>>> Give csv.NewReader your own *bufio.Reader. >>>> Regarding (https://pkg.go.dev/pkg/bufio/#NewReaderSize) if the >>>> underlying io.Reader is already a *bufio.Reader with a big enough size >>>> (and >>>> csv.NewReader uses the default 4k), >>>> then the underlying reader is used, no new wrapping is introduced. >>>> >>>> This way if you use >>>> cr := countingReader{Reader:r} >>>> br := bufio.NewReader(cr) >>>> csvR := csv.NewReader(br) >>>> >>>> then cr.N - br.Buffered() is the number of bytes read by csv.Reader, >>>> the end of the last line read. >>>> >>>> Hope this helps. >>>> >>>> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 3:17:26 >>>> UTC+1): >>>> >>>>> Hi, >>>>> >>>>> I have difficulty counting bytes that were processed by csv.Reader >>>>> because it reads from internally created bufio.Reader. If I pass some >>>>> counting reader to csv.NewReader it will show not the actual number bytes >>>>> "processed" by csv.Reader to receive the output I get calling >>>>> csv.Reader.Read method, but the number of bytes copied to bufio.Reader's >>>>> buffer internally (some bytes may be read during next csv.Reader.Read >>>>> call >>>>> from the buffer). >>>>> >>>>> Is there a way I can deal with this issue by not forking encoding/csv >>>>> package? >>>>> >>>>> To give you more high-level picture - I want to split remote csv file >>>>> to chunks. Each chunk should be standalone csv file - starting from >>>>> actual >>>>> beginning of the line, ending with newline byte. So I'm trying to do the >>>>> following - split file size by the number of chunks, and for each chunk - >>>>> skip first bytes up to newline symbol and read to >>>>> offset+chunkSize+[number >>>>> of bytes to the next newline symbol] >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/23798ee3-60ab-47fc-8007-d4c70fb1783cn%40googlegroups.com.