Re: Hello from Apache Streams

Daniel Gruno Sun, 03 Dec 2017 04:02:47 -0800

On 12/02/2017 10:41 PM, Steve Blackmon wrote:
>  Sorry about that!  Here’s a link to the notebook that doesn’t require
> registration.
> 
> https://www.zepl.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24
> 
> In this notebook we used the %spark interpreter to collect the data, but
> most of the work is done as scala in the driver process.  The streams code
> base is java and not dependent on spark or other frameworks external to the
> jar file.
> 
> The easiest integration I can think of given the python/java language gap
> would use docker - Streams could prepare a docker container packaged with
> all the necessary code, and Kibble installations could use it to run ad-hoc
> or scheduled data processes.  The data collected could be written as
> new-line delimited json on container mounted volumes,  or directly to an
> elasticsearch index.
> 
> Docker’s not really necessary though, if the system where Kibble’s running
> has a JRE configured and a streams distribution local that could work too.

Right, but probably the easiest entry point for people just "wanting to
get things done" :). I could also imagine us setting up a remote service
that could handle this via HTTP API as an alternate solution, akin to
how you would use a GitHub API - that is to say, we'd have a VM that you
could query and it'd have all the Java in place for speedy access to
these sort of things. Either or both would work for me, and if streams
is willing to sort out the actual data gathering, we could have this put
into ES quickly and get started on using the data gathered.

I'll have to ponder how we're going to present this, and which charts
would be most informative here. There is a lot of potential here.

If Streams can provide us with a "run this" sort of container that can
spit out JSON, that would be awesome. While ES directly might be easier,
there's the use-case scenario where ES is not local to the system
(Kibble is intended to support both local ES and remote-via-json-api
systems), so a JSON output might be the best for now.

With regards,
Daniel.

> 
> Steve
> 
> On Dec 2, 2017 at 2:10 PM, Daniel Gruno <[email protected]> wrote:
> 
> 
> On 12/02/2017 09:07 PM, Steve Blackmon wrote:
> 
> Hi Kibble Team,
> 
> I've been checking out the code and the demo site this weekend.
> 
> I'm interested in joining the team and integrating some of the data
> sources maintained in http://streams.apache.org
> 
> Specifically, activity streams from the social media presences of
> projects and contributors (who opt in) as well as statistics derived
> from them could make a nice addition to Kibble.
> 
> Here's an example: analysis of Twitter accounts of Apache project
> using Streams and Zeppelin:
> https://www.zepl.com/UvGWgAZb7/spaces/Sb9ElZuDD/8b49bf71b1a54e16b9d04219b33e243a
> 
> Cheers,
> 
> Steve Blackmon
> [email protected]
> 
> 
> Hi Steve,
> I like the idea, but I am unable to see the link you shared, it shows a
> 404 for me :(. Having said that, looking into the social media space is
> definitely something worth doing!
> 
> With regards,
> Daniel.
>

Re: Hello from Apache Streams

Reply via email to