Hi,
  I have written few datastructures as classes like following..

So, here is my code structure:

project/foo/foo.py , __init__.py
          /bar/bar.py, __init__.py  bar.py  imports foo as from foo.foo
import *
         /execute/execute.py  imports bar as from bar.bar import *

Ultimately I am executing execute.py as

pyspark execute.py

And this works fine locally.. but as soon I submit it on cluster... I see
modules missing error..
I tried to send each and every file using --py-files flag (foo.py bar.py )
and other helper files..

But even then it complaints that module is not found.... So, the question
is.. When one is building a library which is suppose to execute on top of
spark, how should the imports and library be structured so that it works
fine on spark.
When to use pyspark and when to use spark submit to execute python
scripts/module
Bonus points if one can point an example library and how to run it :)
Thanks

Reply via email to