RE: [Jgeneral] A few general questions from a wannabe J-er

Alex Rufon Mon, 31 Mar 2008 02:10:37 -0700

Hi Mattia,

Can we swap jobs?!? :) Wow, I graduated with a major in Economics
decades ago and it seems like you got my dream job. :P


Regarding your concerns (in addition to what Bill and Oleg already
provided):

*** Data Management
Merging datasets is not a problem regardless of source. I had some
projects which requires that I get data from text files, MS-SQL database
records and excel files (even some old dbaseIII+ .DBF files) and
generate SQL INSERT commands to update database table. You see, as soon
as data is already inside your J workspace ... it's all the same data
regardless of source. You just need to know/find out what interface to
use. You see, from my experience with J, it is better to do all merging
in the J workspace instead in SQL. 

*** Performance
Hmmm. About performance ... here is the specs of my Development Machine:
Lenovo T61 
Intel Core DUO 2.2GHZ
1GB RAM
Vista Ultimate SP1

And one of my Production Machine:
Compaq ML-570
2 x Intel XEON 3.0ghz
4GB RAM
Windows 2000 Advance Server

(Obviously) I develop on my laptop and deploy my J scripts on the
production machine (32 bit). Since the system is web base, the J Server
have to complete its tasks within 1 minute. The 1 minute includes
database queries and processing.

I'm telling you this so that you would realize that performance is
relative. One of my system is web-based and I potentially have a lot of
concurrent operations ... I designed my system to execute J using the
smallest/atomic data with the most stand-alone process that it needs to
execute. Some scripts only run for less than 15 seconds but looking at
the big picture ... I sometimes need to do 5 atomic calls to complete 1
business task. I could have executed the whole task at one go but this
would give the end-user the impression that the system is slow. You get
my drift? ;)

So because of my "design", I have never needed to memory map or a 64-bit
machine (I want a 64bit machine though but its final ... I'm getting a
new motorcycle ... hehehehe). Still, if you really need to process large
data ... I would suggest getting a 64bit machine to make life easier. 

*** Computation
Hmmm. I know that there are already pre-built libraries in J but I
normally build my own for the very simple (but sometimes stupid and
wasteful) reason that I want to know how to make one. :P I implemented
my own salesman problem, some normalization rules for global project
planning and some really weird stuff that I don't even want to show
anybody out of embarrassment. Heehehehehe.

Seriously, I write my own code so that it's easier to understand (and
sometimes fix) when the end user starts feeding it the wrong data 6
months after you deployed it. It happens ... believe me.

*** Learning
Ok. I know a lot of veteran J users will not agree but I would suggest
you start coding J using the primitives library. Then later-on, when
you've overcome n00bness ... you can start programming using actual
symbols. Why? Ok, let's look at the following code:

For example, you have the following values for the variable data:       
   data
1   10 20
2    3  7
1   45 12
1 3214  3
3    3  1
2  123  5

The first column is the ID, the second and 3rd columns are values like
quantity and order quantity or something. So you want to find out the
following
1. The Unique list of ID's
2. The tally of each unique id's
3. the Minimum for each unique id,
4. the Maximum for each unique id
5. The total for each unique id.

If you don't use the primitives, you may code it this way
   (~.{."1 data),.({."1 data)(#,<./,>./,+/)/.1{"1 data
1 3 10 3214 3269
2 2  3  123  126
3 1  3    3    3

Using primitives, you can code it this way
NB. Load the primitive library
   load 'primitives'
NB. Define new words to make it more English like
   unique=: ~.
   across=: /
   sum=: +
   (unique take rank 1 data) stitch (take rank 1 data)(tally, min
across, max across, sum across) key 1 from rank 1 data
1 3 10 3214 3269
2 2  3  123  126
3 1  3    3    3

I know it's longer with more words to type ... but it worked for all the
jr programmers who worked under me. It's up to you to decide. So when
you're more familiar with how J works ... you can start code with the
symbols ... ;)

Oh, btw, I've been introduced to J around 1998 but only used it at work
around year 2000. Up to now, I still don't know how to properly use some
of the symbols. :P

Good luck with your project.

r/Alex


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mattia Landoni
Sent: Monday, March 31, 2008 10:56 AM
To: [email protected]
Subject: [Jgeneral] A few general questions from a wannabe J-er

Hi all,

I am an economist and I discovered J a few days ago. I haven't been so
excited since when I was 13 and Santa brought me an 8-bit Nintendo
Entertainment System. Yet before taking a week off from work to study J
(just kidding) I would like to be sure it does everything I need. Here
are
questions in four main topics: data management, performance, actual
computation, and learning. Every answer to any question is very welcome.
Answers to questions marked with a (*) are particularly important to me.
Thank you in advance!

Data Management
- I import data from several sources. Not always are they in
straightforward
formats. Are there libraries or built-in function to import text (e.g.
.csv,
.tab, fixed format) and non-text (e.g. Excel, 1-2-3) data?
- (*) I often merge datasets (sort of SQL join). The other day I saw
that it
is possible to embed a database (SQLite) through a library. Are there
interfaces to other databases? I usually use MySQL (last time I checked
SQLite did not implement enough SQL for my purposes - that was probably
2
years ago). Are there in-built functions to perform similar operations?
(although I'd be very happy to do all the merging in SQL).

Performance
- (***) How does J deal with very large datasets? currently I am dealing
with a 65-Gb dataset. So far only software I can use is SAS. Performing
an
SQL query [SELECT, GROUP BY] in SAS on a dedicated server takes me six
hours, of which a large part of the time is network I/O (I guess SAS's
computing time would be an hour, perhaps two). The data is divided in 7
chunks of 7 to 13 Gb each. Having the same amount of data on a good
computer, would I be able to perform the same operations with J? Assume
plentiful RAM and speedy processor: what's the order of magnitude of the
time it would take?
- I read something about memory mapping in past posts and I intuitively
understand what it means but I never did it. What are the limits of
memory
mapping? In general, what are the techniques to deal with large
datasets?

Computation
- Is there a numerical optimizer/solver? (e.g., given a certain
function,
find local maxima and minima; given an equation, find the zeros). I
could
program this one, but is there one already?
- Is there a sufficiently painless interface to Maxima (symbolic
calculus
toolbox)?

Learning
- What's the fastest way to learn the basics for a greedy person who
learns
the average C-like programming language in a week? Normally what I do is
to
learn "what can be done" and then start programming right away with a
reference at hand. Here it does not seem so simple... right?

Thank you again and, again, any answer to even just one question is
welcome.

Mattia

-- 
Mattia Landoni
1201 S Eads St Apt 417
Arlington, VA 22202-2837
USA
Greenwich -5 hours

Office: +1 202 62 35922
Cell: +1 202 492 3404
Home: +1 360 968 1684

Govern a great country as you would fry a small fish: do not poke at it
too
much.
-- Lao Tzu
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jgeneral] A few general questions from a wannabe J-er

Reply via email to