This is great! Yes, @Supun let's try a python script as well. On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote:
> Thanks Supun! I believe this would work. > > I assume same will work with python scripts ( without notebooks as well). > > --Srinath > > > > On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote: > >> Hi Srinath/Nirmal >> >> I managed to get the $subject working. Here I connected iPython/Jupyter >> Notebook to pyspark, and pyspark submits the job to the remote spark >> cluster (created by DAS). One of the advantages of using Notebook is that >> a user can load the data in DAS tables as spark dataframe, and can >> interactively work on it. >> >> But it also have the following limitations: >> >> - Client side need a spark distribution. (To use pyspark) >> - Have to limit the cores allocated to the Spark App used by DAS >> (CarbonAnalytics), so that the Spark App created by pySpark can run in >> parallel. >> - Have to set the spark-classpath at the client side, with the jars >> used by DAS, so that the once the job is submitted, spark-executor knows >> where to look for the classes. >> >> >> *Training Models:* >> >> As we discussed offline, for large datasets, we can directly use >> algorithms in spark's mllib and ml. This is very straight forward, as the >> data we get from DAS is a spark-dataframe, and hence can train models on >> top of the dataframe (or can convert it to rdd). >> And for small and medium datasets, we can convert the spark-dataframe to >> pandas-dataframe using df.toPandas(), which will load all the to memory, >> and then train sklearn algorithms on top of that. >> >> A sample python script can b found at [1]. >> >> [1] https://github.com/SupunS/play-ground/blob/master/pyspar >> k/PySpark-Sample.ipynb >> >> -- >> *Supun Sethunga* >> Senior Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> Blog: http://supunsetunga.blogspot.com >> > > > > -- > ============================ > Srinath Perera, Ph.D. > http://people.apache.org/~hemapani/ > http://srinathsview.blogspot.com/ > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
