Hello,
I experimented with distributed data persistence. These are my notes:
0) The experiment was to use SQLite as a relational store and q as a column
store and see how the data can be managed across these two stores.
a) jdb came in afterwards but I have not yet ported the experiment to jdb.
1) I used Chris Burke's J to q connector code [1]. I liked the way the
communication protocol was set up.
2) When working with sockets and multiple instances of J server - J client
pattern, the following were noted:
a) managing the ports and logs on the ports
b) setting up the synchronous or asynchronous operations
c) external monitor of the number of instances and their roles in the
computation. This is another process that sets up independent connection to
the instances. It could also be the main instance launcher process.
3) There are interesting patterns that can be explored in the
request-response:
a) synchronous blocking
b) asynchronous
c) asynchronous with callback
4) The protocol can be extended to query for value and creation of verbs
from strings in the target J server.
5) As Alex has noted with data bases, SQL statements eventually cause the
overhead of string manipulations.
6) Debugging was quite tricky. However with the interpreter I was able to
make significant progress as I was able to trace the calls on the client and
follow through on the server. During the debug runs I was encountering
issues with setting breakpoints I do not recollect were exactly.

The communication protocol has to be clear on
a) Communication Layer: This cares only about the communication of
information and reports error information at this level.
b) Setting up the request-response patterns: Reports errors at this level
c) Logical/Computational layer: Errors of this layer have different
characteristics. Probably this can be further split into
c.1) The data for the computation and its validation
c.2) The computation that is invoked on this data and its validation
c.3) The actual computation on the data.

This is how far I got.

~Yuva


[1] See http://www.jsoftware.com/jwiki/Interfaces/Kdb

On Tue, Feb 16, 2010 at 6:51 AM, Skip Cave <[email protected]> wrote:

>
>
> Alex Rufon wrote:
> > I agree with Skip's idea but I would like to suggest including boxed
> arrays or boxed strings in the test data set.
> >
> > I work exclusively with heterogeneous boxed arrays coming in from SQL
> Server. I actually don't process pure numeric information. One sample
> computation is matching a list of order by size against a list of
> consumption by size.
> Skip replies:
>
> I think that if Alex's problem is split across multiple processors, his
> matching process will entail moving some data between the processors
> during execution. I was trying to avoid that issue in my pure numerical
> example. My thought was, if we test a process that doesn't require any
> data movement other than the initial distribution and final assembly,
> and then find that parallel execution of that process doesn't provide
> all that much efficiency gain, then it is unlikely that processes that
> do require data movement between processors would be executed more
> efficiently in parallel.
>
> Alex's problem has the advantage of a practical usage, so it could be
> added in the test suite as a second test example of parallel processing.
> However, I would expect his problem to be less amenable to distributing
> the processor load than the pure in-place computational problem, due to
> the requirement to move data between processors during execution.
>
> Skip Cave
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to