Re: Use Shared Variable in PySpark Executors

2018-09-22 Thread Soheil Pourbafrani
Ok, I'll do that. Thanks

On Sat, Sep 22, 2018 at 7:09 PM Jörn Franke  wrote:

> Do you want to calculate it and share it once with all other executors?
> Then a broadcast variable maybe interesting for you,
>
> > On 22. Sep 2018, at 16:33, Soheil Pourbafrani 
> wrote:
> >
> > Hi, I want to do some processing with PySpark and save the results in a
> variable of type tuple that should be shared among the executors for
> further processing.
> > Actually, it's a Text Mining Processing and I want to use the Vector
> Space Model. So I want to calculate the Vector of all Words (that should be
> reachable for all executors) and save it in a tuple. Is it possible in
> Spark or I should use external storage like database or files?
> >
> >
>


Re: Use Shared Variable in PySpark Executors

2018-09-22 Thread Jörn Franke
Do you want to calculate it and share it once with all other executors? Then a 
broadcast variable maybe interesting for you,

> On 22. Sep 2018, at 16:33, Soheil Pourbafrani  wrote:
> 
> Hi, I want to do some processing with PySpark and save the results in a 
> variable of type tuple that should be shared among the executors for further 
> processing.
> Actually, it's a Text Mining Processing and I want to use the Vector Space 
> Model. So I want to calculate the Vector of all Words (that should be 
> reachable for all executors) and save it in a tuple. Is it possible in Spark 
> or I should use external storage like database or files?
> 
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Use Shared Variable in PySpark Executors

2018-09-22 Thread Soheil Pourbafrani
Hi, I want to do some processing with PySpark and save the results in a
variable of type tuple that should be shared among the executors for
further processing.
Actually, it's a Text Mining Processing and I want to use the Vector Space
Model. So I want to calculate the Vector of all Words (that should be
reachable for all executors) and save it in a tuple. Is it possible in
Spark or I should use external storage like database or files?