Thanks Supun! I believe this would work.

I assume same will work with python scripts ( without notebooks as well).

--Srinath



On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote:

> Hi Srinath/Nirmal
>
> I managed to get the $subject working. Here I connected iPython/Jupyter
> Notebook to pyspark, and pyspark submits the job to the remote spark
> cluster (created by DAS).  One of the advantages of using Notebook is that
> a user can load the data in DAS tables as spark dataframe, and can
> interactively work on it.
>
> But it also have the following limitations:
>
>    - Client side need a spark distribution. (To use pyspark)
>    - Have to limit the cores allocated to the Spark App used by DAS
>    (CarbonAnalytics), so that the Spark App created by pySpark can run in
>    parallel.
>    - Have to set the spark-classpath at the client side, with the jars
>    used by DAS, so that the once the job is submitted, spark-executor knows
>    where to look for the classes.
>
>
> *Training Models:*
>
> As we discussed offline, for large datasets, we can directly use
> algorithms in spark's mllib and ml. This is very straight forward, as the
> data we get from DAS is a spark-dataframe, and hence can train models on
> top of the dataframe (or can convert it to rdd).
> And for small and medium datasets, we can convert the spark-dataframe to
> pandas-dataframe using df.toPandas(), which will load all the to memory,
> and then train sklearn algorithms on top of that.
>
> A sample python script can b found at [1].
>
> [1] https://github.com/SupunS/play-ground/blob/master/
> pyspark/PySpark-Sample.ipynb
>
> --
> *Supun Sethunga*
> Senior Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
> Blog: http://supunsetunga.blogspot.com
>



-- 
============================
Srinath Perera, Ph.D.
   http://people.apache.org/~hemapani/
   http://srinathsview.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to