Thanks, maybe I'll give it a try to include it manually into the repo!

> improve performance and usability on complex apply/map

It will definitely help, but I'm already creating a single loop for each 
formula, no matter how many tensors are involved.

E.g.
    
    
    let df = ...# some DF w/ cols A, B, C, D
    df.mutate(f{"Foo" ~ `A` * `B` - `C` / `D`})
    
    
    Run

will already be rewritten to:
    
    
    var
        col0_47816020 = toTensor(df["A"], float)
        col1_47816021 = toTensor(df["B"], float)
        col2_47816022 = toTensor(df["C"], float)
        col3_47816023 = toTensor(df["D"], float)
        res_47816024 = newTensor[float](df.len)
      for idx in 0 ..< df.len:
        []=(res_47816024, idx, col0_47816020[idx] * col1_47816021[idx] - 
col2_47816022[idx] /
            col3_47816023[idx])
      result = toColumn res_47816024)
    
    
    Run

which is indeed a little slower than a manual map_inline, but still pretty 
fast. Compare the first plot from here:

[https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare](https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare)

Not sure where the variations map_line sees are coming from though. Effects of 
openmp?

**Small aside about the types**

The data types are determined as floats from the usage of *, / etc. Could be 
overridden by giving type hints: 
    
    
    f{int -> float: ...}
      ^--- type of involved tensors
             ^---- type of resulting tensor
    
    
    Run

> AFAIK it should would allow combining complex transformations and do them in 
> a single pass instead of allocating many intermediate dataframes so 
> performance can be an order of magnitude faster on zip/map/filter chains.

While this is certainly exciting to think about, I think it'd be pretty hard to 
(for me in the near future anyways) achieve while:

  1. keeping it simple to extend the library by adding new procs
  2. still allowing usage of the procs in a normal way as to return a new DF 
(without having differently named procs for inplace / not inplace variants).



But this is just me speculating from the not all that simple code of 
zero-functional. I guess having a custom operator like it does would allow us 
to replace the user given proc names though.

If you have a better idea of how to do efficient chaining that seems reasonable 
to implement, I'm all ears.

**what I 'm working on**

Right now I'm rather worrying about having decent performance for group_by and 
inner_join though. I'm looking at 
[https://h2oai.github.io/db-benchmark](https://h2oai.github.io/db-benchmark)/ 
since yesterday. It's a rather brutal reality check, hehe.

Comparing my current code with the first of the 0.5 GB group_by examples to 
pandas and data.table was eye opening. In my current implementation of 
summarize for grouped data frames I actually return the sub data frames for 
each group and apply a simple reduce operation based on the users formula. 
Well, what a surprise, that's slow. I haven't dug deep into data.table of 
pandas yet, but as far as I can tell they essentially special case group_by \+ 
other operation and handle these by just aggregating on all groups in a single 
pass.

So I've implemented the same and even for a single key with a single sum I'm 2 
times slower than running the code with pandas on my machine. To be fair, 
performing operations on sub groups individually is a nice 100x slower than 
pandas.

Still, the biggest performance impact I have to make is in order to allow 
columns with multiple data types to group by. I need some way to check which 
subgroup a row belongs to. Since I can't create a tuple at runtime, in order to 
just use normal comparison operators I decided to calculate a hash for each row 
and compare that. That works well, but gives me that 2x speed penalty.

For the time being though, I think I'm happy with that unless I have a better 
idea / someone can point me to something that works in a typed language and 
doesn't involve huge amount of boilerplate code.

So I'm currently working on an implementation that allows to use user defined 
formulas for aggregation while not having to call a closure for each row. 

Reply via email to