Re: River future direction

Patricia Shanahan Mon, 06 Apr 2015 08:45:24 -0700

Good idea. I've copied the user list.

On 4/6/2015 5:53 AM, Bryan Thompson wrote:

Sure. Maybe we could draw together 10-20 success stories and publish them
out one a week for a few months?  We could certainly point our user base at
the success story posts for river. It would be nice to use this as an
opportunity to convince users that jini/river was a good idea as a
distributed systems platform. This is purely a marketing issue, so having a
flurry of good press would be nice.


Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
[email protected]
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Mon, Apr 6, 2015 at 8:48 AM, Patricia Shanahan <[email protected]> wrote:

On 4/6/2015 4:55 AM, Bryan Thompson wrote:

We've built a scalable distributed graph database using river (
www.blazegraph.com, formally www.bigdata.com).  There are actually two
versions that use river.

1. High availability and self-healing based on low write replication. This
uses zookeeper for the leader election. It would be nice if river
supported
these semantics in the base distribution.

2. Horizontally scaled using dynamic sharding. This implementation uses
river to expose the shards on the data services, to distribute the
evaluation of high level queries over the shards and services, etc.

River has made it relatively easy to have services export RMI interfaces.


Perhaps a "success" story we could post on the web site?

Some possible issues to consider:

a. better communications about the project (i.e., more marketing). I think
that most of the technical documentation for the river ecosystem (other
than javadoc) is pretty old.  It might still be valid, but there is a
perception of stagnation.  Maybe a weekly blog post, project of the month,
etc.


I don't know whether I am right about this, and it is one of the topics on
which I am really looking for feedback, but I feel we have to fix our
"Getting started" experience first. Once that is done, we have a better
chance of converting marketing leads into active users.

  b. bindings for other languages (yes, this is not that easy).

c. leader election semantics and similar interesting patterns in the base
platform.
d. design patterns for building scalable applications on river (it is
really not that easy to get started if I remember back to when I first
used
the platform in the mid 2000s.)
e. good tutorials for non-multicast environments (EC2).
f. good tutorials for security, including living with firewalls and
constraining ports.
g. maybe a json interface to reggie so applications can easily see what's
in the various registrars?

Thanks,
Bryan



----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
[email protected]
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please
notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Mon, Apr 6, 2015 at 4:40 AM, Greg Trasuk <[email protected]>
wrote:

Well, here’s what I’ve used Jini for:

Desktop Java applications communicating with a back-end implemented as a
set of Jini services in “LAN” scope.  In this scenario, Jini has the
following valuable features:
- Zero-configuration on the client side.  The client is able to use
multicast discovery to find the lookup service(s) and then lookup the
service it wants to use.  There is no need for the client app to have a
configured url to reach the back-end.
- If the backend needs to move to a different host, the clients don’t
need
to be reconfigured.
- The client can export a service endpoint, even without publishing it to
the service registrar.  It can use that endpoint to receive event
notifications.  So the user interface can be completely asynchronous.
This
is not easy to do using a request/response protocol like http (i.e. SOAP
or
RESTful services).

Shared Data Store on a LAN.  For example, one service scanned a
programmable logic controller for process data.  Other clients would
display selected portions of that data.  For instance, a daemon would
display the aforementioned process data on overhead displays.  Another
daemon took the process data and logged it to a database every eight
hours
or so.  Apart from the zero-configuration aspects mentioned above, Jini
lets us avoid setting up a central data store.  You want to find a tag
called “/cell-006/smt1/placement-defects”?  Simply query the lookup
service
for services implementing the “SharedDataStore” interface, then select
the
one that lists a service attribute that matches the prefix of the tag you
want, then subscribe to change events for it.  The entire idea of a
distributed key-value store, along with the Paxos-based consistency
algorithm and leader-elections that go along with it ——is simply not
necessary ! —.  You just get the data from the source.

Distributed computing framework, roughly based on data flow architecture,
using JavaSpaces in a leader/follower pattern.  Like any tuple-space
system, the architecture has flexible scaling of resources (just add more
followers as required).  As well, you get automatically optimal dynamic
load balancing (since faster processors finish a packet of work faster,
they simply pull work out of the JavaSpace  in proportion to their speed.
Being JavaSpaces, the strong typing and dynamic class loading turns out
to
be useful in a number of ways, mainly that you can distribute code to the
followers without any extra effort.

All of the above had aspects of what has recently come to be known as
“micro-services architecture” (I’ve been doing this kind of architecture
for 30 years, but that’s a different story, and the kids today won’t
believe it anyway).  In all cases, the ability of Jini services to easily
discover and coordinate with other services, with very little static
configuration makes the system very flexible.

Which leads to my thoughts for the future of River - service integration
in the cloud/data centre.  I look at the convolutions that people are
going
through to get service discovery working in a Docker environment (e.g.
https://www.digitalocean.com/community/tutorials/the-
docker-ecosystem-service-discovery-and-distributed-configuration-stores
),
and I think that Jini has solved this problem already.  The dynamic
discovery and zero-configuration nature of Jini, not to mention the
inherent fault-tolerance that goes along with leasing, etc, makes Jini
perfectly suited to a dynamically-scalable environment.  We just haven’t
made it easy to get started.  Also, in the past, people were often left
with the impression that Jini was too complex.  I think that people have
come around to the idea that the problem-space for distributed computing
is
complex, so the solution-space is necessarily complex as well.

Cheers,

Greg Trasuk.

Re: River future direction

Reply via email to