This is great! Yes, @Supun let's try a python script as well.

On Thu, Sep 15, 2016 at 7:46 AM, Srinath Perera <[email protected]> wrote:

> Thanks Supun! I believe this would work.
>
> I assume same will work with python scripts ( without notebooks as well).
>
> --Srinath
>
>
>
> On Wed, Sep 14, 2016 at 6:47 PM, Supun Sethunga <[email protected]> wrote:
>
>> Hi Srinath/Nirmal
>>
>> I managed to get the $subject working. Here I connected iPython/Jupyter
>> Notebook to pyspark, and pyspark submits the job to the remote spark
>> cluster (created by DAS).  One of the advantages of using Notebook is that
>> a user can load the data in DAS tables as spark dataframe, and can
>> interactively work on it.
>>
>> But it also have the following limitations:
>>
>>    - Client side need a spark distribution. (To use pyspark)
>>    - Have to limit the cores allocated to the Spark App used by DAS
>>    (CarbonAnalytics), so that the Spark App created by pySpark can run in
>>    parallel.
>>    - Have to set the spark-classpath at the client side, with the jars
>>    used by DAS, so that the once the job is submitted, spark-executor knows
>>    where to look for the classes.
>>
>>
>> *Training Models:*
>>
>> As we discussed offline, for large datasets, we can directly use
>> algorithms in spark's mllib and ml. This is very straight forward, as the
>> data we get from DAS is a spark-dataframe, and hence can train models on
>> top of the dataframe (or can convert it to rdd).
>> And for small and medium datasets, we can convert the spark-dataframe to
>> pandas-dataframe using df.toPandas(), which will load all the to memory,
>> and then train sklearn algorithms on top of that.
>>
>> A sample python script can b found at [1].
>>
>> [1] https://github.com/SupunS/play-ground/blob/master/pyspar
>> k/PySpark-Sample.ipynb
>>
>> --
>> *Supun Sethunga*
>> Senior Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>> Blog: http://supunsetunga.blogspot.com
>>
>
>
>
> --
> ============================
> Srinath Perera, Ph.D.
>    http://people.apache.org/~hemapani/
>    http://srinathsview.blogspot.com/
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to