Re: [Dev] [GSoC-2015] Data Wrangler extension for WSO2 Machine Learner

Supun Sethunga Sat, 27 Jun 2015 20:56:04 -0700

Hi Danula,

Did we have a review for the work done so far? If not, shall we have a one?
We can clear out any doubts and issues as well..


Thanks,
Supun

On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <[email protected]> wrote:

> Hi Danula,
>
> Thanks for the update, keep them coming.
>
> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes, this
> is costly, since it would load whole dataset into memory. So, is this an
> operation which involves multiple rows?
>
> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <[email protected]>
> wrote:
>
>> Hi Supun,
>>
>> I modified the "Fill" operation to add what you mentioned.
>>
>> I used a workaround to to implement certain parts of the operations such
>> as filling with values from rows above and below.
>> I created a List Implementation using toArray() method in JavaRDD and
>> then converted it back to a JavaRDD after the operation.
>>
>> This will be inefficient (in terms of both memory and time) when working
>> with very large data sets. But I think its important to have these features
>> included. Otherwise a user would be left with very limited set of
>> operations.
>>
>> Please let me know if you have a different opinion on this.
>>
>> Thanks,
>> Danula
>>
>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <[email protected]> wrote:
>>
>>> Somehow there are issues in implementing certain wrangler functions due
>>>> to limitations in JavaRDD used in spark
>>>> e.g. -
>>>> Fill operation - when filling with values from rows above and below
>>>> Fold operation
>>>
>>>
>>> Agree, since rows will get executed randomly with spark, inter-row
>>> operations are not very meaningful.
>>> But you can slightly modify the implementation of the "Fill" operation,
>>> such as, to fill values based on an expression/static-value/mean etc. (not
>>> depending on other rows)..
>>>
>>> Thanks,
>>> Supun
>>>
>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <[email protected]> wrote:
>>>
>>>> Hi Danula,
>>>>
>>>> Sorry for the late reply. Have you got the details you were looking for?
>>>>
>>>> It would be great if I could get to know which wrangler operations are
>>>>> important for a user of the ML
>>>>
>>>>
>>>> Other than the ones you have mentioned in the proposal, think its
>>>> better to have "Translate" operation as well (to create a new column
>>>> based on an existing column).
>>>>
>>>> Thanks,
>>>> Supun
>>>>
>>>>
>>>>
>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am currently working on generating spark transformations related to
>>>>> the operations available in the data wrangler.
>>>>>
>>>>> Data wrangler provides sufficient parameters to re-create these at
>>>>> spark.I have successfully implemented delete and split operations of
>>>>> wrangler in spark.
>>>>>
>>>>> Once this phase is completed, I can either directly generate these
>>>>> scripts at wrangler or use the javascript output and convert it to spark
>>>>> depending on the implementation.
>>>>>
>>>>> Somehow there are issues in implementing certain wrangler functions
>>>>> due to limitations in JavaRDD used in spark
>>>>>
>>>>> e.g. -
>>>>> Fill operation - when filling with values from rows above and below
>>>>> Fold operation
>>>>>
>>>>> It would be great if I could get to know which wrangler operations are
>>>>> important for a user of the ML
>>>>>
>>>>> Thanks,
>>>>> Danula
>>>>>
>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Danula,
>>>>>>
>>>>>> Please send an update of your work thus far.
>>>>>>
>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Danula,
>>>>>>>
>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly
>>>>>>> generating spark transformations using Wrangler and come up with a 
>>>>>>> summary ?
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>> for GSoC 2015. I am really looking forward to work with you all and
>>>>>>>> contribute to WSO2.
>>>>>>>>
>>>>>>>> I have already completed my primary research on wrangler and would
>>>>>>>> like to meet you to get feedback on the proposed architecture. I am
>>>>>>>> planning to start working on the project before 25th of May.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Danula
>>>>>>>>
>>>>>>>> [1] -
>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [GSoC-2015] Data Wrangler extension for WSO2 Machine Learner

Reply via email to