I am new to the Clojure world. After years of developing finance
applications in R, I am trying to convert a relatively big R/Finance
project into Clojure/Incanter. Some things are going very smoothly. I can
see how the number of LOC is drastically reduced and the code is clean and
concise.
However, in the core area of dealing with financial time series I am having
difficulties. Here are some thoughts:
1) In R one works with matrices and data frames, analogous to Incanter's
matrices and datasets. In R you can do calculations with both types, in
Incanter only with matrices, but not with datasets. Both data frames and
datasets allow for heterogeneous data, both matrices do not.
2) In R a matrix can have both column and row names, in Incanter it can
have neither.
3) From 1) and 2) it seems to me that in Incanter every time you want to do
calculations you lose the naming of your data. This gives me a feeling of
insecurity as I have to think about the ordering of the rows (which is
usually not a problem) and of the columns (which is a big problem).
4) Finance people work primarily with time series. They tend to work with
data frames or matrices in which each column is a time series. The R data
frame structure fits nicely with this since a data frame is a list of its
columns (loosely speaking). Although I don't know what is going on in the
innards of Incanter, it seems to be focused on rows. I wonder if that has a
performance penalty when one is working with columns. One could think of
representing time series as rows in datasets, but that would lead to a loss
of naming, as datasets don't have row names. Or one could work with columns
in datasets and rows in matrices, which would require systematically
transposing the data, which is expensive.
5) I understand that working with Clojure one loses the possibility of
writing code like
A[i, j] = something
where A is a matrix. Here i and j may be numbers or vectors. Actually in R
a new matrix A is created when executes a command like this, so it is not
the performance that is the issue. It is rather the convenience. Would it
be possible to have a function in Clojure/Incanter that when called in the
following way
(def B (foo A i j z))
would create a matrix B with the same dimensions and entries as A except
that the subset of rows and columns defined by i and j would be replaced by
z? (Here i and j could be vectors like [3 5 12] or just ints like 4)
6) To complement the functionality in 5) one would like to be able to apply
a function to the columns or rows or a matrix, *with the parameters varying
with the column*. Sometimes in R the functionality is already embedded in
the function. For example, the function pmin (parallel min), So , if I have
a matrix
mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
I can call pmin and get the following:
pmin(mat, c(3, 4, 5))
[,1] [,2] [,3]
[1,] 1 3 3
[2,] 2 4 4
[3,] 3 5 5
(This uses R's "recycling," which is not how Incanter deals with vectors of
different lengths)
If a function does not have that functionality, one can write
t(apply(mat, 1, function(x, z) {x + z}, c(0, 1, 2)))
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
In any case, just the ability to apply a function to a give or all the
columns or rows of a matrix would be a big help.
I wrote the functions
(defn matrix-map-col
"Applies a function on each element of a column of a matrix."
[A foo j] (matrix-map foo ($ :all j A)))
(defn matrix-maps-cols
"Applies a sequence of functions to the elements of the columns of a
matrix.
The return value is a matrix with the same dimensions as the argument
matrix."
[A foos xs]
(trans (matrix (map #(matrix-map-col A %1 %2) foos xs))))
which would be part of the solution. One would still have to add the
ability to vary the parameters and return a matrix. Given my short
experience with Clojure I wonder if they could be made faster/better and
what would be the best way to implement the remaining functionality.
7) I wrote in R a class that extends R's matrices. It associates a Date
object to each row and provides many other capabilities. When I extract
data from an object I can see the date range and the series I am dealing
with, which is crucial for checking calculations. I understand Clojure is
not OO. How could I have similar capabilities. What kind of construct in
Clojure could I use? I thought about a map with the following keys:
:date - a Java date vector or just a vector of strings of the form
"20130415"
:data - One of the following two alternatives
a) A vector of maps, each one of which would have as a key the column name
(a string) and as value a time series
b) An Incanter dataset
Each alternative has advantages and disadvantages. Has anyone thought about
these issues? Any comments would be very welcome.
These are my thoughts for now. I will also post this on the Incanter group
(I hope that is not a problem).
FS
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.