Thanks Brian, very interesting. Regarding high availability and self healing, would you reccommend River support using zookeeper?
If so, how are you using it? Regards, Peter. ----- Original message ----- > We've built a scalable distributed graph database using river ( > www.blazegraph.com, formally www.bigdata.com). There are actually two > versions that use river. > > 1. High availability and self-healing based on low write replication. > This uses zookeeper for the leader election. It would be nice if river > supported these semantics in the base distribution. > > 2. Horizontally scaled using dynamic sharding. This implementation uses > river to expose the shards on the data services, to distribute the > evaluation of high level queries over the shards and services, etc. > > River has made it relatively easy to have services export RMI interfaces. > > Some possible issues to consider: > > a. better communications about the project (i.e., more marketing). I > think that most of the technical documentation for the river ecosystem > (other than javadoc) is pretty old. It might still be valid, but there > is a perception of stagnation. Maybe a weekly blog post, project of the > month, etc. > b. bindings for other languages (yes, this is not that easy). > c. leader election semantics and similar interesting patterns in the base > platform. > d. design patterns for building scalable applications on river (it is > really not that easy to get started if I remember back to when I first > used the platform in the mid 2000s.) > e. good tutorials for non-multicast environments (EC2). > f. good tutorials for security, including living with firewalls and > constraining ports. > g. maybe a json interface to reggie so applications can easily see what's > in the various registrars? > > Thanks, > Bryan > > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@systap.com > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please > notify the sender by reply email and permanently delete all copies of > the email and its contents and attachments. > > On Mon, Apr 6, 2015 at 4:40 AM, Greg Trasuk <tras...@stratuscom.com> > wrote: > > > > > Well, here’s what I’ve used Jini for: > > > > Desktop Java applications communicating with a back-end implemented as > > a set of Jini services in “LAN” scope. In this scenario, Jini has the > > following valuable features: > > - Zero-configuration on the client side. The client is able to use > > multicast discovery to find the lookup service(s) and then lookup the > > service it wants to use. There is no need for the client app to have a > > configured url to reach the back-end. > > - If the backend needs to move to a different host, the clients don’t > > need to be reconfigured. > > - The client can export a service endpoint, even without publishing it > > to the service registrar. It can use that endpoint to receive event > > notifications. So the user interface can be completely asynchronous. > > This is not easy to do using a request/response protocol like http > > (i.e. SOAP or RESTful services). > > > > Shared Data Store on a LAN. For example, one service scanned a > > programmable logic controller for process data. Other clients would > > display selected portions of that data. For instance, a daemon would > > display the aforementioned process data on overhead displays. Another > > daemon took the process data and logged it to a database every eight > > hours or so. Apart from the zero-configuration aspects mentioned > > above, Jini lets us avoid setting up a central data store. You want > > to find a tag called “/cell-006/smt1/placement-defects”? Simply query > > the lookup service for services implementing the “SharedDataStore” > > interface, then select the one that lists a service attribute that > > matches the prefix of the tag you want, then subscribe to change > > events for it. The entire idea of a distributed key-value store, > > along with the Paxos-based consistency algorithm and leader-elections > > that go along with it ——is simply not necessary ! —. You just get the > > data from the source. > > > > Distributed computing framework, roughly based on data flow > > architecture, using JavaSpaces in a leader/follower pattern. Like any > > tuple-space system, the architecture has flexible scaling of resources > > (just add more followers as required). As well, you get automatically > > optimal dynamic load balancing (since faster processors finish a > > packet of work faster, they simply pull work out of the JavaSpace in > > proportion to their speed. Being JavaSpaces, the strong typing and > > dynamic class loading turns out to be useful in a number of ways, > > mainly that you can distribute code to the followers without any extra > > effort. > > > > All of the above had aspects of what has recently come to be known as > > “micro-services architecture” (I’ve been doing this kind of > > architecture for 30 years, but that’s a different story, and the kids > > today won’t believe it anyway). In all cases, the ability of Jini > > services to easily discover and coordinate with other services, with > > very little static configuration makes the system very flexible. > > > > Which leads to my thoughts for the future of River - service > > integration in the cloud/data centre. I look at the convolutions that > > people are going through to get service discovery working in a Docker > > environment (e.g. > > https://www.digitalocean.com/community/tutorials/the-docker-ecosystem-service-discovery-and-distributed-configuration-stores), > > and I think that Jini has solved this problem already. The dynamic > > discovery and zero-configuration nature of Jini, not to mention the > > inherent fault-tolerance that goes along with leasing, etc, makes Jini > > perfectly suited to a dynamically-scalable environment. We just > > haven’t made it easy to get started. Also, in the past, people were > > often left with the impression that Jini was too complex. I think > > that people have come around to the idea that the problem-space for > > distributed computing is complex, so the solution-space is necessarily > > complex as well. > > > > Cheers, > > > > Greg Trasuk.