Re: [Dev] [GSoC-2015] Data Wrangler extension for WSO2 Machine Learner

Danula Eranjith Thu, 16 Jul 2015 09:38:21 -0700

Hi all,

Sorry for not keeping you in the loop.


After considering and experimenting with several options. I am using the
javascript code generated by wrangler to implement them using spark. I have
used regular expressions to extract the operations, parameters and values
and mapped them to spark transformations I previously developed.

The code generated by wrangler for certain functions have nested operations.

(1)

/* Fill split3  with values from above */
w.add(dw.fill().column(["split3"])
.table(0)
.status("active")
.drop(false)
.direction("down")
.method("copy")
.row(undefined)
)

(2)

/* Delete  rows where split1 is null */
w.add(dw.filter().column([])
.table(0)
.status("active")
.drop(false)
.row(dw.row().column([])
.table(0)
.status("active")
.drop(false)
.conditions([dw.is_null().column([])
.table(0)
.status("active")
.drop(false)
.lcol("split1")
.value(undefined)
.op_str("is null")
])
)
)

I have succeeded in parsing the operations similar to (1) above and
currently working on extending it to work on operations similar to (2).

Next step would be automating the process of spark transformation
generation.

Thanks,
Danula

On Wed, Jul 15, 2015 at 7:32 PM, Nirmal Fernando <[email protected]> wrote:

> Hi Danula,
>
> Please send an update at least every week.
>
> On Wed, Jul 15, 2015 at 5:51 PM, Supun Sethunga <[email protected]> wrote:
>
>> Hi Danula,
>>
>> Any update on the progress? Were you managed to integrate the
>> transformations with the wrangler?
>>
>> Thanks,
>>
>> On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> Update on the current progress of the project and future activities as
>>> we discussed at the recent meeting.
>>>
>>> *Current Progress*
>>>
>>> I have completed the phase of creating spark transformations relevant to
>>> operations available in wrangler.
>>>
>>> Operations implemented
>>> - Fill
>>> - Split
>>> - Drop
>>> - Delete
>>> - Extract
>>>
>>> *Future activities*
>>>
>>> - Modify the wrangler interface to suit the current implementation
>>> - Automate the process of generating Spark transformations
>>> - Integrating wrangler to the ML workflow
>>>
>>> Thanks,
>>> Danula
>>>
>>> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> No, We haven't done a review yet.
>>>> It would be great if we could have one so that I can discuss with you
>>>> all and clarify the next steps of the implementation as you mentioned.
>>>>
>>>> Thanks
>>>> Danula
>>>>
>>>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Danula,
>>>>>
>>>>> Did we have a review for the work done so far? If not, shall we have a
>>>>> one? We can clear out any doubts and issues as well..
>>>>>
>>>>> Thanks,
>>>>> Supun
>>>>>
>>>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Danula,
>>>>>>
>>>>>> Thanks for the update, keep them coming.
>>>>>>
>>>>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes,
>>>>>> this is costly, since it would load whole dataset into memory. So, is 
>>>>>> this
>>>>>> an operation which involves multiple rows?
>>>>>>
>>>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Supun,
>>>>>>>
>>>>>>> I modified the "Fill" operation to add what you mentioned.
>>>>>>>
>>>>>>> I used a workaround to to implement certain parts of the operations
>>>>>>> such as filling with values from rows above and below.
>>>>>>> I created a List Implementation using toArray() method in JavaRDD
>>>>>>> and then converted it back to a JavaRDD after the operation.
>>>>>>>
>>>>>>> This will be inefficient (in terms of both memory and time) when
>>>>>>> working with very large data sets. But I think its important to have 
>>>>>>> these
>>>>>>> features included. Otherwise a user would be left with very limited set 
>>>>>>> of
>>>>>>> operations.
>>>>>>>
>>>>>>> Please let me know if you have a different opinion on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Danula
>>>>>>>
>>>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Somehow there are issues in implementing certain wrangler functions
>>>>>>>>> due to limitations in JavaRDD used in spark
>>>>>>>>> e.g. -
>>>>>>>>> Fill operation - when filling with values from rows above and below
>>>>>>>>> Fold operation
>>>>>>>>
>>>>>>>>
>>>>>>>> Agree, since rows will get executed randomly with spark, inter-row
>>>>>>>> operations are not very meaningful.
>>>>>>>> But you can slightly modify the implementation of the "Fill"
>>>>>>>> operation, such as, to fill values based on an 
>>>>>>>> expression/static-value/mean
>>>>>>>> etc. (not depending on other rows)..
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Danula,
>>>>>>>>>
>>>>>>>>> Sorry for the late reply. Have you got the details you were
>>>>>>>>> looking for?
>>>>>>>>>
>>>>>>>>> It would be great if I could get to know which wrangler operations
>>>>>>>>>> are important for a user of the ML
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Other than the ones you have mentioned in the proposal, think its
>>>>>>>>> better to have "Translate" operation as well (to create a new
>>>>>>>>> column based on an existing column).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I am currently working on generating spark transformations
>>>>>>>>>> related to the operations available in the data wrangler.
>>>>>>>>>>
>>>>>>>>>> Data wrangler provides sufficient parameters to re-create these
>>>>>>>>>> at spark.I have successfully implemented delete and split operations 
>>>>>>>>>> of
>>>>>>>>>> wrangler in spark.
>>>>>>>>>>
>>>>>>>>>> Once this phase is completed, I can either directly generate
>>>>>>>>>> these scripts at wrangler or use the javascript output and convert 
>>>>>>>>>> it to
>>>>>>>>>> spark depending on the implementation.
>>>>>>>>>>
>>>>>>>>>> Somehow there are issues in implementing certain wrangler
>>>>>>>>>> functions due to limitations in JavaRDD used in spark
>>>>>>>>>>
>>>>>>>>>> e.g. -
>>>>>>>>>> Fill operation - when filling with values from rows above and
>>>>>>>>>> below
>>>>>>>>>> Fold operation
>>>>>>>>>>
>>>>>>>>>> It would be great if I could get to know which wrangler
>>>>>>>>>> operations are important for a user of the ML
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Danula
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>
>>>>>>>>>>> Please send an update of your work thus far.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>
>>>>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly
>>>>>>>>>>>> generating spark transformations using Wrangler and come up with a 
>>>>>>>>>>>> summary ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> contribute to WSO2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have already completed my primary research on wrangler and
>>>>>>>>>>>>> would like to meet you to get feedback on the proposed 
>>>>>>>>>>>>> architecture. I am
>>>>>>>>>>>>> planning to start working on the project before 25th of May.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>
>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>> Nirmal
>>>>>>>>>>>
>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Supun Sethunga*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> http://wso2.com/
>>>>>>>> lean | enterprise | middleware
>>>>>>>> Mobile : +94 716546324
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [GSoC-2015] Data Wrangler extension for WSO2 Machine Learner

Reply via email to