Re: River future direction

Peter Wed, 08 Apr 2015 01:16:11 -0700

Thanks Brian, very interesting.

Regarding high availability and self healing, would you reccommend River 
support using zookeeper?


If so, how are you using it?

Regards,

Peter.


----- Original message -----
> We've built a scalable distributed graph database using river (
> www.blazegraph.com, formally www.bigdata.com).   There are actually two
> versions that use river.
> 
> 1. High availability and self-healing based on low write replication.
> This uses zookeeper for the leader election. It would be nice if river
> supported these semantics in the base distribution.
> 
> 2. Horizontally scaled using dynamic sharding. This implementation uses
> river to expose the shards on the data services, to distribute the
> evaluation of high level queries over the shards and services, etc.
> 
> River has made it relatively easy to have services export RMI interfaces.
> 
> Some possible issues to consider:
> 
> a. better communications about the project (i.e., more marketing). I
> think that most of the technical documentation for the river ecosystem
> (other than javadoc) is pretty old.   It might still be valid, but there
> is a perception of stagnation.   Maybe a weekly blog post, project of the
> month, etc.
> b. bindings for other languages (yes, this is not that easy).
> c. leader election semantics and similar interesting patterns in the base
> platform.
> d. design patterns for building scalable applications on river (it is
> really not that easy to get started if I remember back to when I first
> used the platform in the mid 2000s.)
> e. good tutorials for non-multicast environments (EC2).
> f. good tutorials for security, including living with firewalls and
> constraining ports.
> g. maybe a json interface to reggie so applications can easily see what's
> in the various registrars?
> 
> Thanks,
> Bryan
> 
> 
> 
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@systap.com
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
> 
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.   MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
> 
> CONFIDENTIALITY NOTICE:   This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please
> notify the sender by reply email and permanently delete all copies of
> the email and its contents and attachments.
> 
> On Mon, Apr 6, 2015 at 4:40 AM, Greg Trasuk <tras...@stratuscom.com>
> wrote:
> 
> > 
> > Well, here’s what I’ve used Jini for:
> > 
> > Desktop Java applications communicating with a back-end implemented as
> > a set of Jini services in “LAN” scope.   In this scenario, Jini has the
> > following valuable features:
> > - Zero-configuration on the client side.   The client is able to use
> > multicast discovery to find the lookup service(s) and then lookup the
> > service it wants to use.   There is no need for the client app to have a
> > configured url to reach the back-end.
> > - If the backend needs to move to a different host, the clients don’t
> > need to be reconfigured.
> > - The client can export a service endpoint, even without publishing it
> > to the service registrar.   It can use that endpoint to receive event
> > notifications.   So the user interface can be completely asynchronous. 
> > This is not easy to do using a request/response protocol like http
> > (i.e. SOAP or RESTful services).
> > 
> > Shared Data Store on a LAN.   For example, one service scanned a
> > programmable logic controller for process data.   Other clients would
> > display selected portions of that data.   For instance, a daemon would
> > display the aforementioned process data on overhead displays.   Another
> > daemon took the process data and logged it to a database every eight
> > hours or so.   Apart from the zero-configuration aspects mentioned
> > above, Jini lets us avoid setting up a central data store.   You want
> > to find a tag called “/cell-006/smt1/placement-defects”?   Simply query
> > the lookup service for services implementing the “SharedDataStore”
> > interface, then select the one that lists a service attribute that
> > matches the prefix of the tag you want, then subscribe to change
> > events for it.   The entire idea of a distributed key-value store,
> > along with the Paxos-based consistency algorithm and leader-elections
> > that go along with it ——is simply not necessary ! —.   You just get the
> > data from the source.
> > 
> > Distributed computing framework, roughly based on data flow
> > architecture, using JavaSpaces in a leader/follower pattern.   Like any
> > tuple-space system, the architecture has flexible scaling of resources
> > (just add more followers as required).   As well, you get automatically
> > optimal dynamic load balancing (since faster processors finish a
> > packet of work faster, they simply pull work out of the JavaSpace   in
> > proportion to their speed. Being JavaSpaces, the strong typing and
> > dynamic class loading turns out to be useful in a number of ways,
> > mainly that you can distribute code to the followers without any extra
> > effort.
> > 
> > All of the above had aspects of what has recently come to be known as
> > “micro-services architecture” (I’ve been doing this kind of
> > architecture for 30 years, but that’s a different story, and the kids
> > today won’t believe it anyway).   In all cases, the ability of Jini
> > services to easily discover and coordinate with other services, with
> > very little static configuration makes the system very flexible.
> > 
> > Which leads to my thoughts for the future of River - service
> > integration in the cloud/data centre.   I look at the convolutions that
> > people are going through to get service discovery working in a Docker
> > environment (e.g.
> > https://www.digitalocean.com/community/tutorials/the-docker-ecosystem-service-discovery-and-distributed-configuration-stores),
> > and I think that Jini has solved this problem already.   The dynamic
> > discovery and zero-configuration nature of Jini, not to mention the
> > inherent fault-tolerance that goes along with leasing, etc, makes Jini
> > perfectly suited to a dynamically-scalable environment.   We just
> > haven’t made it easy to get started.   Also, in the past, people were
> > often left with the impression that Jini was too complex.   I think
> > that people have come around to the idea that the problem-space for
> > distributed computing is complex, so the solution-space is necessarily
> > complex as well.
> > 
> > Cheers,
> > 
> > Greg Trasuk.

Re: River future direction

Reply via email to