Re: ggplotnim - pretty native plots for us

Vindaar Tue, 24 Mar 2020 11:00:30 -0700

@spip I'll answer your question below aswell.

> Is this compatible with other libraries, such as arraymancer, etc? I think 
> that one of the biggest strengths of the python numerical ecosystem is the 
> good inter-operability of most plotting libraries with numpy. So if that is 
> not already the case I would suggest making that your highest priority.


The answer to that is "sort of". I'll need to explain a little to answer the 
why and what I mean by "sort of".

**The long answer**

Originally when I started the library I never planned to write a data frame 
library to go with this. I quickly realized however that (at least with a 
library like `ggplot2`) one doesn't work well without the other. In a normal 
plotting library every plotting function is a special case. Essentially each 
kind of plot wants data in a specific form / of a specific data type.

So in the beginning I specifically didn't want to use arraymancer internally. I 
love that library, but given that all I wanted to write was a "plotting 
library", this meant two things specifically for me:

  * The library is essentially a sink for the user's data. It doesn't return 
anything, so there's no reason for the internal data type to conform to any 
standards
  * If a user wants to create a plot, performance will **not** be an important 
consideration (which does not imply performance of a plotting library does not 
matter!). Creating a plot will always be slow (compared to pure number 
crunching anyways). There are use cases for libraries, which can create plots 
at several hundred fps. But to be honest, if I need to create a huge number of 
plots and am thus performance sensitive, the question is if a plotting library 
is the best tool in the first place.



For this reason I decided to avoid having arraymancer as a dependency, because 
all its strengths are mostly useless for the intended purpose, but would mean I 
introduce an unnecessary dependency.

If a user is using arraymancer for calculations, it's easy to convert the 
required data to ggplotnim's data types. I felt the overhead of copying the 
data was not a big deal under the assumption mentioned above.

**But** , things did somewhat change when I started to write the data frame.

My first idea was actually to use `NimData`, since I really like library. 
However, the (depending on viewpoint) advantage / disadvantage that their type 
is entirely defined via a schema at compile time, didn't appeal to me. I didn't 
want to end up with a ggplot2 clone that was super restrictive, because 
everything had to be known at compile time.

I was actually hoping that @bluenote would pick up his development of Kadro 
again:

[https://github.com/bluenote10/kadro](https://github.com/bluenote10/kadro)

That sounded perfectly suited. But since he didn't, I simply started to hack 
together something that suits the needs of the library.

Originally in fact the `DataFrame` type was generic and my goal was to write 
the code in such a way that the underlying type does not matter. This made 
things complicated though. In fact I even thought about an arraymancer backend 
from the start:

[https://github.com/Vindaar/ggplotnim/blob/master/playground/arraymancer_backend.nim](https://github.com/Vindaar/ggplotnim/blob/master/playground/arraymancer_backend.nim)

which however never progressed from there. Mainly because I couldn't figure out 
how to make use of arraymancer's performance, when majority of data frame 
operations I did ended up copying around data. Which is how I ended up with 
@PMunch's persistent vector from Clojure. It kind of allowed me to "copy as 
much as I want" without the performance penalty.

This is how we got to the current situation. The data frame is okayish fast for 
simple things to prepare a plot. Anything else, I can't recommend it (also 
because it's extremely lenient on types!).

**tl;dr**

Compatibility with the "rest of the ecosystem" isn't there for practical 
reasons.

The thing is I'd love to profit from @mratsim's amazing work on arraymancer and 
laser!

Once I go back and reconsider performance of the data frame, I hope I will end 
up using as much of arraymancer as I can to be honest. I just need to figure 
out how to do it. :)

> Other than that, I didn't see mention of support for contour plots in the 
> docs. It is surprising how often those come in handy in many scenarios so I'd 
> like for you to add that if it is not available yet. Another thing I like to 
> do is to combine line plots with histograms and/or kernel density plots on 
> the X and Y axis (to get a quick idea of the distribution of the values, 
> particularly in time series). It would be neat to support for that too.

Good point. Contour plots are something I simply didn't think about.

I've never actually thought about how those are implemented before. I guess 
it's just a 2 dimensional KDE, right?

Since I will be implementing `geom_density`, for which I need KDEs anyways, I 
might as well implement N dimensional KDEs. That makes performance an even 
bigger issue though.

This is a case where implementing this in arraymancer would defintely be 
helpful and then just pass the required data from a DF to arraymancer. Maybe in 
a few months time, we can just write:
    
    
    import ggplotnim, arraymaner
    let df: DataFrame = someDataFrame()
    let dfKde = df.kde("x", "y", "z")
    
    Run

where `kde` would be an arraymancer proc; or something along those lines…

> Finally, in signal processing work you are often working with complex 
> samples. In that context it is often handy to plot the I and Q components vs 
> time, placing 2 subplots on top of each other, and linking the X (time) 
> zoom/pan of the two subplots. It would be really nice if that were supported.

When you say "on top of each other". Do you mean essentially a plot with a 
secondary axis? These are already supported, but are somewhat limited right now.

There's no recipe of these at the moment though. An example:

[https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225761](https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225761)

The downside of secondary axes at the moment is that it is just a fake axis. It 
is ticks and labels on the RHS of the plot, but the underlying data is still 
drawn into the coordinate system defined by the main axis.

I know that `ggplot2` explicitly does not allow completely independent axes 
(only those which can be calculated from one another, e.g. unit conversions), 
because [Hadley Wickham thinks other cases are easily 
misleading](https://stackoverflow.com/a/3101876). And to an extent I agree. 
However, I **do** think there is a place for them, so I will provide better 
support for them in the future.

**real subplots**

Or do you mean a normal subplot consisting of several (in principle not 
connected) plots in a single graphic? Those are also supported, but their use 
is not perfectly nice yet. One has to make use of `ginger` functionality 
directly.

An example inspired by: 
[https://staff.fnwi.uva.nl/r.vandenboomgaard/SP20162017/SystemsSignals/plottingsignals.html](https://staff.fnwi.uva.nl/r.vandenboomgaard/SP20162017/SystemsSignals/plottingsignals.html)
    
    
    import ggplotnim, seqmath, math, sequtils, complex, ginger
    let t = linspace(-0.02, 0.05, 1000)
    let y1 = t.mapIt(exp(im(2'f64) * Pi * 50 * it).re)
    let y2 = t.mapIt(exp(im(2'f64) * Pi * 50 * it).im)
    let df = seqsToDf({ "t" : t,
                        "Re x(t)" : y1,
                        "Im x(t)" : y2 })
    let plt1 = ggcreate(
      ggplot(df, aes("t", "Re x(t)")) +
        geom_line() +
        xlim(-0.02, 0.05) +
        ggtitle("Real part of x(t)=e^{j 100 π t}"),
      width = 800, height = 300
    )
    let plt2 = ggcreate(
      ggplot(df, aes("t", "Im x(t)")) +
        geom_line() +
        xlim(-0.02, 0.05) +
        ggtitle("Imaginary part of x(t)=e^{j 100 π t}"),
      width = 800, height = 300
    )
    # combine both into a single viewport to draw as one image
    var plt = initViewport(wImg = 800, hImg = 600)#wImg = 800.0, hImg = 800)
    plt.layout(1, rows = 2)
    # embed the finished plots into the the new viewport
    plt.embedAt(0, plt1.view)
    plt.embedAt(1, plt2.view)
    plt.draw("real_imag_subplot.pdf")
    
    Run

Which produces the following plot: 
[https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225762](https://gist.github.com/Vindaar/5292d4d9b8fb667e3eb27061627dbbfe#gistcomment-3225762)

Another example can be found here: 
[https://gist.github.com/Vindaar/fc158afbc75627260aed90264398e473](https://gist.github.com/Vindaar/fc158afbc75627260aed90264398e473)

If you have something else in mind, let me know!

Re: ggplotnim - pretty native plots for us

Reply via email to