Hello ComDev,

The Streams podling has been brainstorming ways to increase awareness of the 
project and it’s capabilities.  We’ve also been working to make it easier to 
get started as a user, without starting the journey by downloading JDK Maven 
and friends.  Using the software to provide benefit to the Foundation seems 
like a good thing to try.

One use case for Streams is to build personal or organizational datasets of 
social media profiles and content for internal development and analysis, using 
the technologies and tools you and your organization prefer, rather than those 
provided by the upstream system.

I took the liberty of creating a few Zeppelin notebooks which collect Apache 
project profiles and posts, normalize them to activity streams format, and 
interact with them using spark data frames.

The notebooks are currently hosted in my zeppelinhub account, which anyone with 
the link below can access.  

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24

If this group sees potential benefit, I’d be happy to work to set them up for 
use by anyone at Apache in a dedicated Zeppelin deployment and take the lead on 
maintaining them going forward.

In any case we’d appreciate any feedback on what could would make this 
prototype more valuable..

Background on Streams:

Apache Streams (incubating) unifies a diverse world of digital profiles and 
online activities into common formats and vocabularies, and makes these 
datasets accessible across a variety of databases, devices, and platforms for 
streaming, browsing, search, sharing, and analytics use-cases.

Streams contains libraries and patterns for specifying, publishing, and 
inter-linking schemas, and assists with conversion of activities (posts, 
shares, likes, follows, etc.) and objects (profiles, pages, photos, videos, 
etc.) between the representation, format, and encoding preferred by supported 
data providers (Twitter, Instagram, etc.), and storage services (Cassandra, 
Elasticsearch, HBase, HDFS, Neo4J, etc.)

In theory pretty much any JSON or XML API which uses a "look-up by ID and type” 
model can be co-erced into collections of activity-streams normalized profiles 
and posts - systems such as GitHub, JIRA, MeetUp could be added to the roadmap 
and have notebooks created once those providers are built.

Reply via email to