Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread David Russell
Hi Ben,

> My company uses Lamba to do simple data moving and processing using python
> scripts. I can see using Spark instead for the data processing would make it
> into a real production level platform.

That may be true. Spark has first class support for Python which
should make your life easier if you do go this route. Once you've
fleshed out your ideas I'm sure folks on this mailing list can provide
helpful guidance based on their real world experience with Spark.

> Does this pave the way into replacing
> the need of a pre-instantiated cluster in AWS or bought hardware in a
> datacenter?

In a word, no. SAMBA is designed to extend-not-replace the traditional
Spark computation and deployment model. At it's most basic, the
traditional Spark computation model distributes data and computations
across worker nodes in the cluster.

SAMBA simply allows some of those computations to be performed by AWS
Lambda rather than locally on your worker nodes. There are I believe a
number of potential benefits to using SAMBA in some circumstances:

1. It can help reduce some of the workload on your Spark cluster by
moving that workload onto AWS Lambda, an infrastructure on-demand
compute service.

2. It allows Spark applications written in Java or Scala to make use
of libraries and features offered by Python and JavaScript (Node.js)
today, and potentially, more libraries and features offered by
additional languages in the future as AWS Lambda language support
evolves.

3. It provides a simple, clean API for integration with REST APIs that
may be a benefit to Spark applications that form part of a broader
data pipeline or solution.

> If so, then this would be a great efficiency and make an easier
> entry point for Spark usage. I hope the vision is to get rid of all cluster
> management when using Spark.

You might find one of the hosted Spark platform solutions such as
Databricks or Amazon EMR that handle cluster management for you a good
place to start. At least in my experience, they got me up and running
without difficulty.

David

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread Benjamin Kim
Hi David,

My company uses Lamba to do simple data moving and processing using python 
scripts. I can see using Spark instead for the data processing would make it 
into a real production level platform. Does this pave the way into replacing 
the need of a pre-instantiated cluster in AWS or bought hardware in a 
datacenter? If so, then this would be a great efficiency and make an easier 
entry point for Spark usage. I hope the vision is to get rid of all cluster 
management when using Spark.

Thanks,
Ben


> On Feb 1, 2016, at 4:23 AM, David Russell  wrote:
> 
> Hi all,
> 
> Just sharing news of the release of a newly available Spark package, SAMBA 
> . 
> 
> 
> https://github.com/onetapbeyond/lambda-spark-executor 
> 
> 
> SAMBA is an Apache Spark package offering seamless integration with the AWS 
> Lambda  compute service for Spark batch and 
> streaming applications on the JVM.
> 
> Within traditional Spark deployments RDD tasks are executed using fixed 
> compute resources on worker nodes within the Spark cluster. With SAMBA, 
> application developers can delegate selected RDD tasks to execute using 
> on-demand AWS Lambda compute infrastructure in the cloud.
> 
> Not unlike the recently released ROSE 
>  package that extends 
> the capabilities of traditional Spark applications with support for CRAN R 
> analytics, SAMBA provides another (hopefully) useful extension for Spark 
> application developers on the JVM.
> 
> SAMBA Spark Package: https://github.com/onetapbeyond/lambda-spark-executor 
> 
> ROSE Spark Package: https://github.com/onetapbeyond/opencpu-spark-executor 
> 
> 
> Questions, suggestions, feedback welcome.
> 
> David
> 
> -- 
> "All that is gold does not glitter, Not all those who wander are lost."



[ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-01 Thread David Russell
Hi all,

Just sharing news of the release of a newly available Spark package, SAMBA
.


https://github.com/onetapbeyond/lambda-spark-executor

SAMBA is an Apache Spark package offering seamless integration with the AWS
Lambda  compute service for Spark batch and
streaming applications on the JVM.

Within traditional Spark deployments RDD tasks are executed using fixed
compute resources on worker nodes within the Spark cluster. With SAMBA,
application developers can delegate selected RDD tasks to execute using
on-demand AWS Lambda compute infrastructure in the cloud.

Not unlike the recently released ROSE
 package that
extends the capabilities of traditional Spark applications with support for
CRAN R analytics, SAMBA provides another (hopefully) useful extension for
Spark application developers on the JVM.

SAMBA Spark Package: https://github.com/onetapbeyond/lambda-spark-executor

ROSE Spark Package: https://github.com/onetapbeyond/opencpu-spark-executor


Questions, suggestions, feedback welcome.

David

-- 
"*All that is gold does not glitter,** Not all those who wander are lost."*