for everyone interested this is the solution in Go Playground:
https://play.golang.org/p/Rxwcwhai4Gl

On Saturday, October 31, 2020 at 7:01:08 PM UTC+1 Severyn Lisovsky wrote:

> Tamás Gulácsi, wow didn't know that providing bufio.Reader to 
> bufio.NewReader doesn't wrap your reader. Looks like this is the solution 
> I've been looking for. Thanks!
>
> On Saturday, October 31, 2020 at 6:50:18 PM UTC+1 Tamás Gulácsi wrote:
>
>> Why do you need an access to the internal bufio.Reader?
>>
>> If you provide a bufio.Reader to bufio.NewReader, then it will NOT create 
>> a new reader, but give back your reader.
>> So if you keep your bufio.Reader, and give it to csv.NewReader, than you 
>> will have the same *bufio.Reader 
>> as what the csv.Reader's inner r !
>>
>> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 18:31:34 
>> UTC+1):
>>
>>> Tamás Gulácsi, this was basically my initial idea to do that, but 
>>> unfortunately there is no access to internal bufio.Reader. See:
>>> https://golang.org/src/encoding/csv/reader.go#L170
>>>
>>> peterGo, my file is ~100GB so downloading it just for sake of splitting 
>>> doesn't make sense to me. I want for each worker to make use of 
>>> NewRangeReader 
>>> method 
>>> <https://godoc.org/cloud.google.com/go/storage#ObjectHandle.NewRangeReader> 
>>> to 
>>> download only related piece of the file. 
>>>
>>> ren...@ix.netcom.com ByteCount reader that wraps the underlying reader 
>>> wouldn't help because csv.Reader doesn't read from underlying reader 
>>> synchronically, it reads from bufio.Reader which buffers the bytes. So for 
>>> example if you read 1 row from CSV (eg. 10 bytes) from underlying io.Reader 
>>> will be 4096 bytes read. On the next csv.Reader.Read() call none of bytes 
>>> will be read from underlying io.Reader because it will take next row out of 
>>> the buffer
>>>
>>> On Saturday, October 31, 2020 at 6:02:32 PM UTC+1 Tamás Gulácsi wrote:
>>>
>>>> Give csv.NewReader your own *bufio.Reader. 
>>>> Regarding (https://pkg.go.dev/pkg/bufio/#NewReaderSize) if the 
>>>> underlying io.Reader is already a *bufio.Reader with a big enough size 
>>>> (and 
>>>> csv.NewReader uses the default 4k),
>>>> then the underlying reader is used, no new wrapping is introduced.
>>>>
>>>> This way if you use 
>>>>   cr := countingReader{Reader:r}  
>>>>   br := bufio.NewReader(cr)
>>>>   csvR := csv.NewReader(br)
>>>>
>>>> then cr.N - br.Buffered() is the number of bytes read by csv.Reader, 
>>>> the end of the last line read.
>>>>
>>>> Hope this helps.
>>>>
>>>> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 3:17:26 
>>>> UTC+1):
>>>>
>>>>> Hi,
>>>>>
>>>>> I have difficulty counting bytes that were processed by csv.Reader 
>>>>> because it reads from internally created bufio.Reader. If I pass some 
>>>>> counting reader to csv.NewReader it will show not the actual number bytes 
>>>>> "processed" by csv.Reader to receive the output I get calling 
>>>>> csv.Reader.Read method, but the number of bytes copied to bufio.Reader's 
>>>>> buffer internally (some bytes may be read during next csv.Reader.Read 
>>>>> call 
>>>>> from the buffer).
>>>>>
>>>>> Is there a way I can deal with this issue by not forking encoding/csv 
>>>>> package?
>>>>>
>>>>> To give you more high-level picture - I want to split remote csv file 
>>>>> to chunks. Each chunk should be standalone csv file - starting from 
>>>>> actual 
>>>>> beginning of the line, ending with newline byte. So I'm trying to do the 
>>>>> following - split file size by the number of chunks, and for each chunk - 
>>>>> skip first bytes up to newline symbol and read to 
>>>>> offset+chunkSize+[number 
>>>>> of bytes to the next newline symbol]
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/23798ee3-60ab-47fc-8007-d4c70fb1783cn%40googlegroups.com.

Reply via email to