[akka-user] Re: Spark without hadoop

Dean Wampler Mon, 20 Apr 2015 07:24:13 -0700

Answers inline below.

On Sunday, April 19, 2015 at 4:04:15 AM UTC-5, tomerneeraj wrote:
>
> Hi, 
>
> We would like to use spark without Hadoop. To use it in highly scalable 
> and high availability mode, yarn and hdfs Api do the purpose of resource 
> scheduling and shared storage. We have data stored in separate disk(not 
> shared). Couple of queries regarding this 
>
> 1. Can we replace YARN with Akka cluster for resource scheduling(master 
> and worker node work distribution )?? 
>


Akka cluster doesn't have the resource management capabilities nor 
integration with Spark that are required. We at Typesafe are considering 
implementing this capability. For now, your best alternatives to YARN are 
Mesos, for which we are offering production support, and standalone mode, 
where you manually configure a cluster yourself. Mesos is best for 
general-purpose, multi-job and multi-use clustering, while standalone is 
fine if you have just a few jobs running, like a continuous streaming job 
with its own, dedicated hardware.


> 2. Is it necessary to have shared file system for spark streaming. Can we 
> have standalone disk for master and worker in spark streaming and resource 
> scheduling without sharing any disk between spark nodes?? 
>

It's necessary to have shared filesystem. It could be NFS, but you'll have 
poor I/O performance. Fortunately, running HDFS without the rest of Hadoop 
is not difficult. It might be possible to use other distributed filesystems 
like Ceph, but I haven't tried that.


> 3. What is the algorithm to distribute traffic by master node to worker 
> node and how does spark streaming scale. Is there any way AKKA cluster 
> helping it somehow?? 
>

Spark does a good job partitioning data, even incoming streams, across the 
cluster. When reading from a distributed file system it knows about (i.e., 
HDFS and S3), it can read and process blocks in parallel. Akka messaging is 
used for some internal communications, but Spark isn't "deeply" dependent 
on Akka.

Akka would be an excellent foundation for a big data system. At Typesafe, 
we're thinking about how to make use of it for different use cases ;)
 

>
> Regards 
> Neeraj 
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Spark without hadoop

Reply via email to