[go-nuts] Re: bulding pipelines in golang

clementauger888 Wed, 13 Feb 2019 09:51:18 -0800

hi,

in recent days i have been working on fixing some issues with above 
presented code 
(https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go#L38)
besides the fact the internal code uses a lot of reflection mechanisms to 
implement the api, putting in de facto on the wrong side of the performance 
line.
there was a bunch of memory management on the user land side that could be 
improved.


for example, that stage 
https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go#L68 
creates as many slices as input batch to process.
see also line 70 71 72.

to quickly take a look to the given solution open  
https://github.com/clementauger/st/blob/master/examples/walkfiles/main5.go#L51

You will find the usage of sync.Pool coupled to two new kind of stages 
PoolBuffered L53 and PoolRecycler L62 
they take care of allocations/recyclings performed in order to pass in and 
out a routine context L58 by accumulating or consuming values of containers 
such as slices. 
this solves lines 68 and 70 of the original demonstration code.

You will also find the introduction of self-contained function within the 
Concurrently call L58. 
those functions returns the iterable function instead of being it. They can 
also return instances of st.Mapper directly, as demonstrated.
this helps to solve lines 71 and 72 of the original demonstration code.
in an earlier version, this problem was handled with a sync.Pool 
(https://github.com/clementauger/st/blob/master/examples/walkfiles/main3.go#L69)
However, this is not the best solution available.
because those self-contained functions are executed once for each worker,  
they can allocate on their own routine at beginning without needing of 
synchronization.

ftr, i have kept several versions i have edited since the original i posted 
in the previous email,
https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go
https://github.com/clementauger/st/blob/master/examples/walkfiles/main1.go
etc...
https://github.com/clementauger/st/blob/master/examples/walkfiles/main5.go

finally, note that the last commit is a major change as it breaks the 
Mapper interface.
its not yet tagged, it will happen later.

I am searching for ideas about how i could get ride of the use of reflect 
package.
static analysis can not work.
dynamic go code generation did not work, might work, seems difficult.
any idea is welcome.

thanks for reading.

Hi,
>
> I recently confronted the problem of building data stream pipeline in 
> golang to build some etl programs.
>
> I found it was not so simple to use the golang idioms.
> Using them i noticed i needed deep care and understanding to produce 
> correct code.
> I also felt they was uselessly verbose and repetitive.
>
> I have searched for existing libraries produced by the community 
> but I failed to find a suitable one for my need.
> Something like ratchet (https://github.com/dailyburn/ratchet) was close 
> to it, 
> but its unlikely i use such complex api.
>
> I came to write my own for this purpose, 
> you can find it here https://github.com/clementauger/st
>
> To achieve it i have been extensively using the reflection API, which is 
> really helpful.
> Unfortunately, I have been using interface{} almost everywhere, and that 
> would be difficult to change.
> In exchange this api provides lots of flexibility.
>
> For comparison and introduction purposes i have rewrote the pipeline walk 
> file example
> available in the blog at https://blog.golang.org/pipelines
>
> The version i provide is available at
> https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go
>
> This version implements a slightly more complex flow mecanism as it 
> bufferizes path, 
> however, it is much easier, shorter (<150 LOC), and i believe clearer code 
> than the original version.
> Where the last property is subject to personal opinion, yet, I encourage 
> you to consider it.
>
> Overall, while this implementation is slower because there are tons of 
> additionnal indirections,
> I think it is still interesting to consider to compensate that 
> with easier and appropriate bufferization and paralellisation whenever 
> possible
> to achieve better performance.
> I intended that this library helps to make that happens as easily and 
> smoothly as possible.
>
> For your information you can also find a bulk version 
> of the original bounded version provided in the blog 
> at 
> https://github.com/clementauger/x/blob/master/01_pipeline/bounded_bulk.go.
> This last program is more suitable for full comparison with the version i 
> provide.
>
> I hope to engage in interesting conversations around that,
> thanks for reading.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: bulding pipelines in golang

Reply via email to