Re: [Jchat] J programming for big data

Raul Miller Wed, 07 Sep 2011 08:46:37 -0700

On Wed, Sep 7, 2011 at 11:33 AM, Zheng, Xin (NIH) [C]
<[email protected]> wrote:
> I am from R and I've got tired of its performance. So I am looking for some 
> other language. I wonder how the performance of J when analyzing big data(GB 
> or even TB). Could anyone give an rough idea?


The answer depends on your machine and your computations and your
operating system.

For a very rough first approximation assume a factor of 5 overhead on
data structure being manipulated (since you typically need
intermediate results).  And assume that if your calculation requires
swap you will need a factor of 1000 extra time (though how slow
depends on how swap is implemented on your machine).

For large calculations I usually like breaking things up into blocks
that easily fit into memory (blocks of 1e6 data elements often works
fine).

You will probably want to use memory mapped files for large data structures.

I have never tried TB files in J.  You may want to consider kx.com's
interpreter instead of J if you routinely work on that size of data --
their user community routinely works on very large data sets.  Expect
to pay a lot of money, though, if you go that route.

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] J programming for big data

Reply via email to