TL;DR I’ve found a way to dramatically reduce barriers to using streams as a 
beginner.

Using the streams 0.3 release, it’s quite a headache for a novice to use 
streams. We have a tutorial on the website, but it’s quite a journey. You have 
to check out all three repos and install them each in order before you get a 
jar file you could use to get data, then you can run a few pre-canned streams, 
and those are intermediate not beginner level.  

In an ideal world, anyone would be able to yum or apt-get (or docker pull) 
individual providers or processors and run them on their own without building 
from source or composing them into multi-step streams.  

We'd have increase our build and compliance complexity significantly to publish 
official binaries. So what can we do to drop the learning curve precipitously 
without doing that?

Providers are really simple to run. The hard part is getting all of the right 
classes and configuration properties into a JVM. Inspired by how zeppelin’s 
%dep interpreter reduces the friction in composing and running a scala 
notebook, I wanted to find a way to get the same ability from a linux shell.

The commands below go from just a java installation to flat files of twitter 
data in just a few minutes.

I think until we have binary distributions, this is how our tutorials should 
tell the world to get started with streams.  

Thoughts?  

-----  

# install sbtx

curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt > 
/usr/bin/sbtx && chmod 0755 /usr/bin/sbtx

# create a workspace

mkdir twitter-test; cd twitter-test;

# supply a config file with credentials

cat > application.conf << EOF
twitter {
  oauth {
    consumerKey = ""
    consumerSecret = ""
    accessToken = ""
    accessTokenSecret = ""
  }
  retrySleepMs = 5000
  retryMax = 250
  info = [
    18055613
  ]
}
EOF

sbtx -210 -sbt-create

set resolvers += "Local Maven Repository" at 
"file://"+Path.userHome.absolutePath+"/.m2/repository"

set libraryDependencies += "org.apache.streams" % "streams-provider-twitter" % 
"0.4-incubating-SNAPSHOT"

set fork := true

run-main org.apache.streams.twitter.provider.TwitterUserInformationProvider 
application.conf users.txt

run-main org.apache.streams.twitter.provider.TwitterTimelineProvider 
application.conf statuses.txt

set javaOptions += "-Dtwitter.endpoint=friends"

run-main org.apache.streams.twitter.provider.TwitterFollowingProvider 
application.conf friends.txt

set javaOptions += "-Dtwitter.endpoint=followers"

exit

ls -l

Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
-rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
-rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
-rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
-rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
-rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt


Reply via email to