Idan Zalzberg created SPARK-1526:
------------------------------------
Summary: Running spark driver program from my local machine
Key: SPARK-1526
URL: https://issues.apache.org/jira/browse/SPARK-1526
Project: Spark
Issue Type: Wish
Components: Spark Core
Reporter: Idan Zalzberg
Currently it seems that the design choice is that the driver program should be
close network-wise to the worker and allow connections to be created from
either side.
This makes using Spark somewhat harder since when I develop locally I not only
to package all my program, but also all it's local dependencies.
let's say I have a local DB with names of files in HADOOP that I want to
process with spark, now I need my local DB to be accessible from the cluster so
it can fetch the file names in runtime.
The driver program is an awesome thing, but it loses some of it's strength if
you can't really run it anywhere.
It seems to me that the problem is with the DAGScheduler that needs to be close
to the worker, maybe it shouldn't be embedded in the driver then?
--
This message was sent by Atlassian JIRA
(v6.2#6252)