hi, in recent days i have been working on fixing some issues with above presented code (https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go#L38) besides the fact the internal code uses a lot of reflection mechanisms to implement the api, putting in de facto on the wrong side of the performance line. there was a bunch of memory management on the user land side that could be improved.
for example, that stage https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go#L68 creates as many slices as input batch to process. see also line 70 71 72. to quickly take a look to the given solution open https://github.com/clementauger/st/blob/master/examples/walkfiles/main5.go#L51 You will find the usage of sync.Pool coupled to two new kind of stages PoolBuffered L53 and PoolRecycler L62 they take care of allocations/recyclings performed in order to pass in and out a routine context L58 by accumulating or consuming values of containers such as slices. this solves lines 68 and 70 of the original demonstration code. You will also find the introduction of self-contained function within the Concurrently call L58. those functions returns the iterable function instead of being it. They can also return instances of st.Mapper directly, as demonstrated. this helps to solve lines 71 and 72 of the original demonstration code. in an earlier version, this problem was handled with a sync.Pool (https://github.com/clementauger/st/blob/master/examples/walkfiles/main3.go#L69) However, this is not the best solution available. because those self-contained functions are executed once for each worker, they can allocate on their own routine at beginning without needing of synchronization. ftr, i have kept several versions i have edited since the original i posted in the previous email, https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go https://github.com/clementauger/st/blob/master/examples/walkfiles/main1.go etc... https://github.com/clementauger/st/blob/master/examples/walkfiles/main5.go finally, note that the last commit is a major change as it breaks the Mapper interface. its not yet tagged, it will happen later. I am searching for ideas about how i could get ride of the use of reflect package. static analysis can not work. dynamic go code generation did not work, might work, seems difficult. any idea is welcome. thanks for reading. Hi, > > I recently confronted the problem of building data stream pipeline in > golang to build some etl programs. > > I found it was not so simple to use the golang idioms. > Using them i noticed i needed deep care and understanding to produce > correct code. > I also felt they was uselessly verbose and repetitive. > > I have searched for existing libraries produced by the community > but I failed to find a suitable one for my need. > Something like ratchet (https://github.com/dailyburn/ratchet) was close > to it, > but its unlikely i use such complex api. > > I came to write my own for this purpose, > you can find it here https://github.com/clementauger/st > > To achieve it i have been extensively using the reflection API, which is > really helpful. > Unfortunately, I have been using interface{} almost everywhere, and that > would be difficult to change. > In exchange this api provides lots of flexibility. > > For comparison and introduction purposes i have rewrote the pipeline walk > file example > available in the blog at https://blog.golang.org/pipelines > > The version i provide is available at > https://github.com/clementauger/st/blob/master/examples/walkfiles/main.go > > This version implements a slightly more complex flow mecanism as it > bufferizes path, > however, it is much easier, shorter (<150 LOC), and i believe clearer code > than the original version. > Where the last property is subject to personal opinion, yet, I encourage > you to consider it. > > Overall, while this implementation is slower because there are tons of > additionnal indirections, > I think it is still interesting to consider to compensate that > with easier and appropriate bufferization and paralellisation whenever > possible > to achieve better performance. > I intended that this library helps to make that happens as easily and > smoothly as possible. > > For your information you can also find a bulk version > of the original bounded version provided in the blog > at > https://github.com/clementauger/x/blob/master/01_pipeline/bounded_bulk.go. > This last program is more suitable for full comparison with the version i > provide. > > I hope to engage in interesting conversations around that, > thanks for reading. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.