I agree with Skip's idea but I would like to suggest including boxed arrays or
boxed strings in the test data set.
I work exclusively with heterogeneous boxed arrays coming in from SQL Server. I
actually don't process pure numeric information. One sample computation is
matching a list of order by size against a list of consumption by size. In this
instance, I had to match the two tables by Style (string) and Size (string)
then multiply the Order Qty (number) against the Consumption (number) while
saving the Material information (ex. Material ID, Material Description,
Merchandising Material Description, Material Color, UOM, Shipping Instructions,
etc.) which are all strings. The result of the operation is actually a 2
dimensional matrix which represents a normal Database table. Which I then
process again to be converted into SQL DML commands for either INSERT or
UPDATE. This processing normally involves enclosing all string columns with
single quotes like so:
[data=.4 5 $ ;:'aa bb cc dd'
+--+--+--+--+--+
|aa|bb|cc|dd|aa|
+--+--+--+--+--+
|bb|cc|dd|aa|bb|
+--+--+--+--+--+
|cc|dd|aa|bb|cc|
+--+--+--+--+--+
|dd|aa|bb|cc|dd|
+--+--+--+--+--+
'''',~ each '''', each data
+----+----+----+----+----+
|'aa'|'bb'|'cc'|'dd'|'aa'|
+----+----+----+----+----+
|'bb'|'cc'|'dd'|'aa'|'bb'|
+----+----+----+----+----+
|'cc'|'dd'|'aa'|'bb'|'cc'|
+----+----+----+----+----+
|'dd'|'aa'|'bb'|'cc'|'dd'|
+----+----+----+----+----+
then I'll comma separate this data and eventually getting this result:
INSERT INTO test_table (f1,f2,f3,f4,f5) VALUES ('aa','bb','cc','dd','aa')
INSERT INTO test_table (f1,f2,f3,f4,f5) VALUES ('bb',cc','dd','aa','bb')
INSERT INTO test_table (f1,f2,f3,f4,f5) VALUES ('cc','dd','aa','bb','cc')
INSERT INTO test_table (f1,f2,f3,f4,f5) VALUES ('dd','aa','bb','cc','dd')
now complications arises when there are already single quotes in the string
like so: "alex's" which have to be converted to 'alex''s' to be handled
properly by J.
For me, string manipulation is one of the slowest things that I am using J for.
Still, doing string manipulation is still faster in J than looping through it
with C# and VB.NET.
So again, I am suggesting that the test data includes string/characters with
numbers.
________________________________________
From: [email protected] [[email protected]] On Behalf Of Skip
Cave [[email protected]]
Sent: Tuesday, February 16, 2010 4:11 PM
To: Chat forum
Subject: Re: [Jchat] Multiple cores
It seems to me that it should be possible to define a simple numerical
process that would appear, at least at first glance, to benefit from
parallel operations. We could use that process to examine the actual
effects of various parallel implementations, and compare execution times
to a pure single-process J implementation.
The process itself may or may not have any practical value, but it would
at least be a benchmark to examine how such parallel mechanisms could be
implemented, as well as allowing timing comparisons between single and
multi-threaded implementations.
An obvious characteristic of a process that would appear to benefit from
parallel processing are algorithms that minimize movement of data
between processor cores. A set of in-place operations on a fixed set of
data would seem to have a good chance of showing significant
efficiencies using parallel techniques.
So I will make a proposal for such a problem: Create a large vector of
random integers. Perform a large set of sequential operations on those
individual integers, square them, then take the natural log, then the
inverse, then the ceiling, etc. Keep going for awhile, to give the
processing units a workout. To simplify result verification, the final
output of all the manipulations should be the same as the original vector.
To parallelize this numerical process, the random numbers would be
divided up equally across the multiple processor cores initially, and no
more data movement between cores would take place during the processing.
Subsequent operations on the data items are passed to all of the cores
sequentially, and the cores each operate on their portions of the data
using the operations specified. When all the operations are completed,
the data is re-assembled back to a vector the same size as the original
vector.
In this process, no data movement is made between processor cores. Only
process commands are passed to each processor, so it can perform the
next numerical operation on all of its data items. With totally
independent processors and separate memory, it would seem that at some
point, parallel processing would beat out single processes.
In reality, whether this process will benefit from parallelization
depends on the size of the vector, how many processors, how much memory
is shared between processors, and what the memory bandwidth is for all
the processors, among other things. My (often faulty) intuition tells me
that at some vector size and number of in-place operations, parallel
execution of this process should still start to become more efficient
than a single-threaded implementation. However, the complexities of
cache memory, shared memory bandwidth, and many other factors may prove
my intuition wrong.
In any case, this example would provide a base benchmark for evaluating
parallel processes on arrays of data. We could try this algorithm on
various multi-core, multi threaded, multi processor schemes with
various memory architectures, to see what configurations (if any)
benefit from parallelization. Just my two cents....
Skip Cave
.
.
.Raul Miller wrote:
> On Mon, Feb 15, 2010 at 1:11 PM, Don Guinn <[email protected]> wrote:
>
>> I don't think it would take much to actually move problems once we have
>> other J sessions available. The socket interface it there. 3!:1 and 3!:2
>> provide the tool to transfer the data in a generalized way. Sockets provide
>> easy notification to coordinate between instances of J. Big problems I see
>> are security and being able to start J remotely. I don't know where to start
>> with them.
>>
>
> To start J remotely you need to create a "J server" which can
> spawn J sessions on that machine.
>
> To do this securely you need to have some sort of secure connection
> mechanism. The primary risk you need to defend against is
> unauthorized clients. The ideal mechanism here involves a
> collection of machines which is not connected to the internet.
>
> A slightly weaker version allows some machines to be connected
> to the internet but black listing them, such that they are not
> allowed to be clients to the "J server" (they can be servers
> for other J sessions).
>
> Other variations are also possible (for example, giving clients
> secret keys and using them to generate hashes against
> recent timestamps and executable sentences). In other
> words:
>
> client and server:
> hash=: md5sum secret,timestamp,sentence
>
> (hash sent from client must match hash generated on server).
> (server also rejects unreasonable timestamps).
>
> Except, I think we have better options than md5sum.
>
> FYI,
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm