Re: [FRIAM] General-Purpose Computing on a Semantic NetworkSubstrate

Marcus G. Daniels Sun, 29 Apr 2007 16:27:31 -0700

Howdy,
> RDF is to represent your API, runtime program, and virtual machine and 
> SPARQL/Update is the protocol for manipulating the network (e.g. 
> INSERT, DELETE, SELECT, ASK, etc.).
> 1. We are dealing with a 10 billion triple (edge) semantic network. 
> Some analysis can be done by harvesting subsets of the network and 
> performing our network analysis algorithms in main memory. However, 
> there are some algorithms that are just not possible with a 'main 
> memory' approach.
Sandia has a multithreaded graph library project intended for very large 
semantic networks called Eldorado MTGL.  It is a `in core' or `main 
memory' approach.   They are targeting XT3-based systems where some of 
the Torrenza friendly CPUs are Cray's massively multithreaded XMT 
processors.  Like any big computer, it will have terabytes of RAM, but 
unlike a typical cluster type system, the XMT processors will be able to 
see it all.   The supercomputer interconnect has variable latency 
depending on the distance of memory (# of routed hops or page-in -- the 
latter it won't by have by design), but the fact there are thousands of 
threads at once traversing the network is analogous to a grocery store 
with dozens of lanes instead of just one.   In contrast I understood a 
big bottleneck was just loading it from their clients' relational 
database.


While `load' and `store' CPU instructions act against a non-persistent 
contiguous array of virtual memory, it is an easy constraint to get 
around.   One way is by using some of the most significant bits of the 
address space for type information and the lower bits for the object 
offsets or numeric values (or in this case maybe an encoding like 
type+context+{noun,verb,noun} where `noun' and `verb' were indexes into 
a `context' object that boxed the UUIDs).  For example, Opteron's have a 
48 bit virtual address space, so a split of 16 bits for type info and 48 
for data would make sense, giving 281475 billion addressable objects.    
With POSIX, the mmap system call can be used to set up these regions in 
an efficient way, and/or make them persistent to a file.   Named objects 
or UUIDs can be swizzled to container data structures in persistent 
virtual memory (application-specific swap files, basically).   Some 
persistent object stores like [1]  literally do this.  At the end of the 
day, there's some data you want and whether you have a triple store 
server that reads the data from various sources and then sends it over a 
stream to you, or you rely on the MMU of the machine to page it in from 
a local disk sector or external pager, the issue of `main memory' or not 
seems like a red herring.
> 2. When the virtual machine is represented in RDF, then the virtual 
> machine can be queried like any other aspect of the semantic network. 
> For instance, it is possible to write code that is interpreted by a 
> virtual machine that causes the virtual machine to re-write itself at 
> run-time. To my knowledge, no such mechanism exists in other 
> programming languages. For instance, the virtual machine of the Java 
> programming language is hidden to the programmer. One can not 
> manipulate main memory directory nor manipulate the JVM program at 
> runtime (the JVM is written in another language). This is not the case 
> in my proposed model of computing.
Well, new objects can be described and byte coded, using libraries like:

  http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html

For example, the Kawa language framework implements Scheme on it.

  http://www.gnu.org/software/kawa/index.html
> The virtual machine is an object much like any other object and can be 
> manipulated as such. There is much to be said of how this could be 
> used in terms of evolutionary algorithms.
Any language that has an `eval' or dynamic binding can do this (Python, 
Perl, Ruby, Lisp, JavaScript, Objective C).   For example, in Common 
Lisp to brutally change all addition into multiplication (yes it can 
also be done with lexical or dynamic scope):

(setf (symbol-function '+) #'*)
(+ 3 4)
12

As far as I can tell manipulating the `virtual machine' is mostly just a 
question of what bindings are between names and actions.  
System level features (like whether physical threads or I/O features are 
available) you either have or don't.

Java is a bit confusing here because there is has infix syntax without 
operator overloading and because it isn't a dynamically typed language.  
> 4. There already exists a host of technologies to support the Semantic 
> Web: multi-billion triple triple-store servers, ontology development 
> environments, reasoners, etc. The model I proposed requires very 
> little investment by the community to use. The Neno language that LANL 
> is currently developing looks very similar to Java.
Like with inverse field referencing or method invocations that would 
still be concise in C# with a LINQ interface to SPARQL.  Dynamic 
bindings can be implemented using a dictionary class that can be loaded 
and modified using reflection, and has a method for callout.  Field 
cardinality -- a fixed size array of objects, where some can be null, no?  
> 5. The programming language proposed by LANL has constructs that are 
> not possible (realistically) with other programming language. This is 
> due to the fact that the programming language has an underlying 
> network representation. There are some neat constructs that emerge 
> when the data structure is a network (e.g. inverse method invocation, 
> inverse field referencing, field cardinality, general object query 
> (automatic JavaSpaces)). Please refer to section 3 of the arXiv 
> pre-print.
I guess it depends how concise you want these to be.   If you really 
want it concise, then I wonder why use Java and not something like 
Prolog?  I mentioned LINQ because it is an evolutionary addition to an 
established popular environment (C# and .NET).  Inverse method/field 
invocation/references are just matters of collecting a set things and 
then apply a function to them.   And going back to the load/store 
analogy of a CPU, the notion of a distributed space is really not much 
more than adding more bits to the address or a `who' argument:

load(who,where) or who.load(where)
store(who,where,what) or who.store(where,what)

(And implementing that's just a matter of RMI, SOAP, MPI, or pick you're 
favorite.)

cheers,

Marcus

[1] http://www.rscheme.org/rs/a/2005/persistence

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Re: [FRIAM] General-Purpose Computing on a Semantic NetworkSubstrate

Reply via email to