Yes, python script also works, without the notebook. in python script (.py) scenario, there will be no sparkContext created by default. Can run the scipt as follows:
On Thu, Sep 15, 2016 at 9:55 AM, Nirmal Fernando <[email protected]> wrote: > This is great! Yes, @Supun let's try a python script as well. > > On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote: > >> Thanks Supun! I believe this would work. >> >> I assume same will work with python scripts ( without notebooks as well). >> >> --Srinath >> >> >> >> On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote: >> >>> Hi Srinath/Nirmal >>> >>> I managed to get the $subject working. Here I connected iPython/Jupyter >>> Notebook to pyspark, and pyspark submits the job to the remote spark >>> cluster (created by DAS). One of the advantages of using Notebook is that >>> a user can load the data in DAS tables as spark dataframe, and can >>> interactively work on it. >>> >>> But it also have the following limitations: >>> >>> - Client side need a spark distribution. (To use pyspark) >>> - Have to limit the cores allocated to the Spark App used by DAS >>> (CarbonAnalytics), so that the Spark App created by pySpark can run in >>> parallel. >>> - Have to set the spark-classpath at the client side, with the jars >>> used by DAS, so that the once the job is submitted, spark-executor knows >>> where to look for the classes. >>> >>> >>> *Training Models:* >>> >>> As we discussed offline, for large datasets, we can directly use >>> algorithms in spark's mllib and ml. This is very straight forward, as the >>> data we get from DAS is a spark-dataframe, and hence can train models on >>> top of the dataframe (or can convert it to rdd). >>> And for small and medium datasets, we can convert the spark-dataframe to >>> pandas-dataframe using df.toPandas(), which will load all the to memory, >>> and then train sklearn algorithms on top of that. >>> >>> A sample python script can b found at [1]. >>> >>> [1] https://github.com/SupunS/play-ground/blob/master/pyspar >>> k/PySpark-Sample.ipynb >>> >>> -- >>> *Supun Sethunga* >>> Senior Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> Blog: http://supunsetunga.blogspot.com >>> >> >> >> >> -- >> ============================ >> Srinath Perera, Ph.D. >> http://people.apache.org/~hemapani/ >> http://srinathsview.blogspot.com/ >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *Supun Sethunga* Senior Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324 Blog: http://supunsetunga.blogspot.com
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
