Thanks Supun! I believe this would work. I assume same will work with python scripts ( without notebooks as well).
--Srinath On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote: > Hi Srinath/Nirmal > > I managed to get the $subject working. Here I connected iPython/Jupyter > Notebook to pyspark, and pyspark submits the job to the remote spark > cluster (created by DAS). One of the advantages of using Notebook is that > a user can load the data in DAS tables as spark dataframe, and can > interactively work on it. > > But it also have the following limitations: > > - Client side need a spark distribution. (To use pyspark) > - Have to limit the cores allocated to the Spark App used by DAS > (CarbonAnalytics), so that the Spark App created by pySpark can run in > parallel. > - Have to set the spark-classpath at the client side, with the jars > used by DAS, so that the once the job is submitted, spark-executor knows > where to look for the classes. > > > *Training Models:* > > As we discussed offline, for large datasets, we can directly use > algorithms in spark's mllib and ml. This is very straight forward, as the > data we get from DAS is a spark-dataframe, and hence can train models on > top of the dataframe (or can convert it to rdd). > And for small and medium datasets, we can convert the spark-dataframe to > pandas-dataframe using df.toPandas(), which will load all the to memory, > and then train sklearn algorithms on top of that. > > A sample python script can b found at [1]. > > [1] https://github.com/SupunS/play-ground/blob/master/ > pyspark/PySpark-Sample.ipynb > > -- > *Supun Sethunga* > Senior Software Engineer > WSO2, Inc. > http://wso2.com/ > lean | enterprise | middleware > Mobile : +94 716546324 > Blog: http://supunsetunga.blogspot.com > -- ============================ Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
