Howdy,
> RDF is to represent your API, runtime program, and virtual machine and
> SPARQL/Update is the protocol for manipulating the network (e.g.
> INSERT, DELETE, SELECT, ASK, etc.).
> 1. We are dealing with a 10 billion triple (edge) semantic network.
> Some analysis can be done by harvesting subsets of the network and
> performing our network analysis algorithms in main memory. However,
> there are some algorithms that are just not possible with a 'main
> memory' approach.
Sandia has a multithreaded graph library project intended for very large
semantic networks called Eldorado MTGL. It is a `in core' or `main
memory' approach. They are targeting XT3-based systems where some of
the Torrenza friendly CPUs are Cray's massively multithreaded XMT
processors. Like any big computer, it will have terabytes of RAM, but
unlike a typical cluster type system, the XMT processors will be able to
see it all. The supercomputer interconnect has variable latency
depending on the distance of memory (# of routed hops or page-in -- the
latter it won't by have by design), but the fact there are thousands of
threads at once traversing the network is analogous to a grocery store
with dozens of lanes instead of just one. In contrast I understood a
big bottleneck was just loading it from their clients' relational
database.
While `load' and `store' CPU instructions act against a non-persistent
contiguous array of virtual memory, it is an easy constraint to get
around. One way is by using some of the most significant bits of the
address space for type information and the lower bits for the object
offsets or numeric values (or in this case maybe an encoding like
type+context+{noun,verb,noun} where `noun' and `verb' were indexes into
a `context' object that boxed the UUIDs). For example, Opteron's have a
48 bit virtual address space, so a split of 16 bits for type info and 48
for data would make sense, giving 281475 billion addressable objects.
With POSIX, the mmap system call can be used to set up these regions in
an efficient way, and/or make them persistent to a file. Named objects
or UUIDs can be swizzled to container data structures in persistent
virtual memory (application-specific swap files, basically). Some
persistent object stores like [1] literally do this. At the end of the
day, there's some data you want and whether you have a triple store
server that reads the data from various sources and then sends it over a
stream to you, or you rely on the MMU of the machine to page it in from
a local disk sector or external pager, the issue of `main memory' or not
seems like a red herring.
> 2. When the virtual machine is represented in RDF, then the virtual
> machine can be queried like any other aspect of the semantic network.
> For instance, it is possible to write code that is interpreted by a
> virtual machine that causes the virtual machine to re-write itself at
> run-time. To my knowledge, no such mechanism exists in other
> programming languages. For instance, the virtual machine of the Java
> programming language is hidden to the programmer. One can not
> manipulate main memory directory nor manipulate the JVM program at
> runtime (the JVM is written in another language). This is not the case
> in my proposed model of computing.
Well, new objects can be described and byte coded, using libraries like:
http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
For example, the Kawa language framework implements Scheme on it.
http://www.gnu.org/software/kawa/index.html
> The virtual machine is an object much like any other object and can be
> manipulated as such. There is much to be said of how this could be
> used in terms of evolutionary algorithms.
Any language that has an `eval' or dynamic binding can do this (Python,
Perl, Ruby, Lisp, JavaScript, Objective C). For example, in Common
Lisp to brutally change all addition into multiplication (yes it can
also be done with lexical or dynamic scope):
(setf (symbol-function '+) #'*)
(+ 3 4)
12
As far as I can tell manipulating the `virtual machine' is mostly just a
question of what bindings are between names and actions.
System level features (like whether physical threads or I/O features are
available) you either have or don't.
Java is a bit confusing here because there is has infix syntax without
operator overloading and because it isn't a dynamically typed language.
> 4. There already exists a host of technologies to support the Semantic
> Web: multi-billion triple triple-store servers, ontology development
> environments, reasoners, etc. The model I proposed requires very
> little investment by the community to use. The Neno language that LANL
> is currently developing looks very similar to Java.
Like with inverse field referencing or method invocations that would
still be concise in C# with a LINQ interface to SPARQL. Dynamic
bindings can be implemented using a dictionary class that can be loaded
and modified using reflection, and has a method for callout. Field
cardinality -- a fixed size array of objects, where some can be null, no?
> 5. The programming language proposed by LANL has constructs that are
> not possible (realistically) with other programming language. This is
> due to the fact that the programming language has an underlying
> network representation. There are some neat constructs that emerge
> when the data structure is a network (e.g. inverse method invocation,
> inverse field referencing, field cardinality, general object query
> (automatic JavaSpaces)). Please refer to section 3 of the arXiv
> pre-print.
I guess it depends how concise you want these to be. If you really
want it concise, then I wonder why use Java and not something like
Prolog? I mentioned LINQ because it is an evolutionary addition to an
established popular environment (C# and .NET). Inverse method/field
invocation/references are just matters of collecting a set things and
then apply a function to them. And going back to the load/store
analogy of a CPU, the notion of a distributed space is really not much
more than adding more bits to the address or a `who' argument:
load(who,where) or who.load(where)
store(who,where,what) or who.store(where,what)
(And implementing that's just a matter of RMI, SOAP, MPI, or pick you're
favorite.)
cheers,
Marcus
[1] http://www.rscheme.org/rs/a/2005/persistence
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org