Dan, 

That's a huge amount of stats packages available for use assuming we 
achieve interop with Renjin's dataframes. I'll look into it as well. My 
priorities are to first get something working for $DAYJOB, and then to 
build a more generally useful package, and finally add extras such as 
interop.

- Arthur

On Wednesday, March 9, 2016 at 7:04:17 PM UTC-5, Daniel Slutsky wrote:
>
> Thank you for raising this question.
>
> By the way, one desired feature for a Clojure dataframe abstraction would 
> be good interop with Renjin's dataframes.
> Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number 
> of JVM-based statistical libraries. Most of them rely on the dataframe 
> abstraction for their data. R is also very Lisp-like in its data 
> representation, so wrapping all this with Clojure would be a delight.
>
>
>
> On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote:
>>
>>
>> If you're going to do any work in this area, I would highly encourage you 
>> to do in as part of the core.matrix library. That is what Incanter is or 
>> will be using for it's dataset implementation. But it's nice that those 
>> abstractions and implementations be separate from Incanter itself, since 
>> Incanter is a rather large dependency.
>>
>> Core.matrix is certainly (in my eyes) becoming the de facto matrix 
>> computation library in the Clojure ecosystem, and I think in the level of 
>> interop between different implementations there, and extent of utilization 
>> by the clojure community, we rival the python offerings. However, while 
>> core.matrix has some dataset protocols, api functions and basic 
>> implementations, there's still some work to get the full expressiveness of 
>> the data.frame pattern as seen in R and Pandas. Specifically, there is no 
>> support for setting rownames (or arbitrary "name" assignments beyond that 
>> of a single dimension (columns...)). This is something I started working on 
>> a while back, but wasn't able to finish. I could potentially push what I 
>> came up with to a fork, but unfortunately, I don't have any more time to 
>> work on the problem at the moment.
>>
>> Mike Anderson is a great project maintainer, and will probably be happy 
>> to help guide you in stitching together a solution.
>>
>> Best
>>
>> Chris
>>
>>
>>
>>
>>
>> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
>> wrote:
>>>
>>> Is there any desire or need for a Clojure DataFrame?
>>>
>>>
>>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>>> pandas.DataFrame.
>>>
>>> Incanter's DataSet may already be fulfilling this purpose, and if so, 
>>> I'd like to know if and how people are using it.
>>>
>>> From quickly researching, I see that some prior work has been done in 
>>> this space, such as:
>>>
>>> * https://github.com/cardillo/joinery
>>> * https://github.com/mattrepl/data-frame
>>> * 
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>>
>>> Rather than going off and creating a competing implementation (
>>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>>> working on, or would like to work on a DataFrame and related utilities for 
>>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>>> is everybody happy with using Incanter or some other library that I'm not 
>>> aware of? If there's already a defacto standard out there, would anyone 
>>> care to please point it out?
>>>
>>> As background information:
>>>
>>> My specific use-case is in NLP and ML, where I often explore and 
>>> prototype in Python, but I'm then left to deal with a smattering of 
>>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>>> and utilities for reading data. It would be great to have a unified way to 
>>> explore my data in the Clojure REPL, and then serve the same code and 
>>> models in production.
>>>
>>> I would love for Clojure to have a broadly compatible ecosystem similar 
>>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>>> aware if they've yet become the defacto standards in the community.
>>>
>>> Any feedback is greatly appreciated.
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to