On October 18, 2016 at 9:19:50 AM, Benjamin Young (byo...@bigbluehat.com) wrote:
Sorry I’ve not written here sooner. I’d reached out to the Incubator list while
at the W3C’s TPAC even about keeping Apache Streams in the incubator in hopes
of also seeing it support the nearly finalized ActivityStreams 2.0
Since then, I’ve noticed Steve’s efforts to make Streams much simpler for new
years—which is fabulous! I (sadly) don’t code in Java…since college, but I do
have a desire to run code that aggregates my social streams into a standard
format, store it in a database I prefer (in my case Apache CouchDB), and do
cool stuff with it for my own reasons. ;) That desire is what drew me into the
Streams talk at ApacheCon.
A lot of businesses, techies, and non-techies are interested in producing and
consuming content outside of standard single-channel generic web and mobile
apps - but there seems to be a dearth of quality low-cost commercial offerings
to do so.
While digging around the project documents, I’ve found two overview
descriptions of the project.
This one’s from the web site:
”Apache Streams (incubating) unifies a diverse world of digital profiles and
online activities into common formats and vocabularies, and makes these
datasets accessible across a variety of databases, devices, and platforms for
streaming, browsing, search, sharing, and analytics use-cases.”
This is our primary focus right now - expanding interoperability to more
sources, and enabling interesting use cases that grow the community.
And this one from the repo’s readme file:
“Apache Streams is a lightweight (yet scalable) server for ActivityStreams. The
role of Apache Streams is to provide a central point of aggregation, filtering
and querying for Activities that have been submitted by disparate systems.
Apache Streams also intends to include a mechanism for intelligent filtering
and recommendation to reduce the noise to end users.”
This copy is older (the project moved from SVN to GIT in 2013). It’s still an
interesting goal, but data interoperability is a more pressing problem in need
of a robust open-source solution, IMO. There are plenty of mature databases,
data science tools, and data vis libraries around - I think if it were dead
simple for anyone to collect and normalize social streams we’d see
experimentation and adjacent tooling flourish.
In either case, the story that I get—and the thing I want—is minimal setup to
get my Twitter, etc, piped into a database +/- an API +/- a UI.
I think we are closing in on this, minus official API and UI. The group of
active contributors will need to grow and diversify to tackle those but there’s
nothing impeding their development (integration and deployment will require
making some choices).
Am I on the right track here? Or is Streams really meant for Java-developers to
mix into their projects?
We’re looking into distribution with docker which will be a good way for
power-users with zero interest in Java or Apache technologies to run streams.
The core project libraries, connectors, and converters may be Java, but there’s
plenty of room to innovate and improve the project outside that world. We have
a ton of work ahead answering questions about what normalized data types to
support, which systems to prioritize, how we want the normalized data to look,
and how to map in data from upstream systems. Design and product work, not
Once I know that, I’ll know best how to help. :)
If I can make a suggestion for how to get started, try to run any/all of our
providers and examples while refusing to look at any source code. Let us if
that’s not working out so we can change things up until it does. Also let us
know how well the existing providers and examples meet your needs as a social
data power-user and what opportunities for improvement you see, to help us
build out the JIRA backlog.