Thanks Danula.

On Thu, Jul 16, 2015 at 10:07 PM, Danula Eranjith <[email protected]>
wrote:

> Hi all,
>
> Sorry for not keeping you in the loop.
>
> After considering and experimenting with several options. I am using the
> javascript code generated by wrangler to implement them using spark. I have
> used regular expressions to extract the operations, parameters and values
> and mapped them to spark transformations I previously developed.
>
> The code generated by wrangler for certain functions have nested
> operations.
>
> (1)
>
> /* Fill split3  with values from above */
> w.add(dw.fill().column(["split3"])
> .table(0)
> .status("active")
> .drop(false)
> .direction("down")
> .method("copy")
> .row(undefined)
> )
>
> (2)
>
> /* Delete  rows where split1 is null */
> w.add(dw.filter().column([])
> .table(0)
> .status("active")
> .drop(false)
> .row(dw.row().column([])
> .table(0)
> .status("active")
> .drop(false)
> .conditions([dw.is_null().column([])
> .table(0)
> .status("active")
> .drop(false)
> .lcol("split1")
> .value(undefined)
> .op_str("is null")
> ])
> )
> )
>
> I have succeeded in parsing the operations similar to (1) above and
> currently working on extending it to work on operations similar to (2).
>
> Next step would be automating the process of spark transformation
> generation.
>
> Thanks,
> Danula
>
> On Wed, Jul 15, 2015 at 7:32 PM, Nirmal Fernando <[email protected]> wrote:
>
>> Hi Danula,
>>
>> Please send an update at least every week.
>>
>> On Wed, Jul 15, 2015 at 5:51 PM, Supun Sethunga <[email protected]> wrote:
>>
>>> Hi Danula,
>>>
>>> Any update on the progress? Were you managed to integrate the
>>> transformations with the wrangler?
>>>
>>> Thanks,
>>>
>>> On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Update on the current progress of the project and future activities as
>>>> we discussed at the recent meeting.
>>>>
>>>> *Current Progress*
>>>>
>>>> I have completed the phase of creating spark transformations relevant
>>>> to operations available in wrangler.
>>>>
>>>> Operations implemented
>>>> - Fill
>>>> - Split
>>>> - Drop
>>>> - Delete
>>>> - Extract
>>>>
>>>> *Future activities*
>>>>
>>>> - Modify the wrangler interface to suit the current implementation
>>>> - Automate the process of generating Spark transformations
>>>> - Integrating wrangler to the ML workflow
>>>>
>>>> Thanks,
>>>> Danula
>>>>
>>>> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> No, We haven't done a review yet.
>>>>> It would be great if we could have one so that I can discuss with you
>>>>> all and clarify the next steps of the implementation as you mentioned.
>>>>>
>>>>> Thanks
>>>>> Danula
>>>>>
>>>>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Danula,
>>>>>>
>>>>>> Did we have a review for the work done so far? If not, shall we have
>>>>>> a one? We can clear out any doubts and issues as well..
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Danula,
>>>>>>>
>>>>>>> Thanks for the update, keep them coming.
>>>>>>>
>>>>>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes,
>>>>>>> this is costly, since it would load whole dataset into memory. So, is 
>>>>>>> this
>>>>>>> an operation which involves multiple rows?
>>>>>>>
>>>>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>> I modified the "Fill" operation to add what you mentioned.
>>>>>>>>
>>>>>>>> I used a workaround to to implement certain parts of the operations
>>>>>>>> such as filling with values from rows above and below.
>>>>>>>> I created a List Implementation using toArray() method in JavaRDD
>>>>>>>> and then converted it back to a JavaRDD after the operation.
>>>>>>>>
>>>>>>>> This will be inefficient (in terms of both memory and time) when
>>>>>>>> working with very large data sets. But I think its important to have 
>>>>>>>> these
>>>>>>>> features included. Otherwise a user would be left with very limited 
>>>>>>>> set of
>>>>>>>> operations.
>>>>>>>>
>>>>>>>> Please let me know if you have a different opinion on this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Danula
>>>>>>>>
>>>>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Somehow there are issues in implementing certain wrangler
>>>>>>>>>> functions due to limitations in JavaRDD used in spark
>>>>>>>>>> e.g. -
>>>>>>>>>> Fill operation - when filling with values from rows above and
>>>>>>>>>> below
>>>>>>>>>> Fold operation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Agree, since rows will get executed randomly with spark, inter-row
>>>>>>>>> operations are not very meaningful.
>>>>>>>>> But you can slightly modify the implementation of the "Fill"
>>>>>>>>> operation, such as, to fill values based on an 
>>>>>>>>> expression/static-value/mean
>>>>>>>>> etc. (not depending on other rows)..
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Danula,
>>>>>>>>>>
>>>>>>>>>> Sorry for the late reply. Have you got the details you were
>>>>>>>>>> looking for?
>>>>>>>>>>
>>>>>>>>>> It would be great if I could get to know which wrangler
>>>>>>>>>>> operations are important for a user of the ML
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Other than the ones you have mentioned in the proposal, think its
>>>>>>>>>> better to have "Translate" operation as well (to create a new
>>>>>>>>>> column based on an existing column).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Supun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I am currently working on generating spark transformations
>>>>>>>>>>> related to the operations available in the data wrangler.
>>>>>>>>>>>
>>>>>>>>>>> Data wrangler provides sufficient parameters to re-create these
>>>>>>>>>>> at spark.I have successfully implemented delete and split 
>>>>>>>>>>> operations of
>>>>>>>>>>> wrangler in spark.
>>>>>>>>>>>
>>>>>>>>>>> Once this phase is completed, I can either directly generate
>>>>>>>>>>> these scripts at wrangler or use the javascript output and convert 
>>>>>>>>>>> it to
>>>>>>>>>>> spark depending on the implementation.
>>>>>>>>>>>
>>>>>>>>>>> Somehow there are issues in implementing certain wrangler
>>>>>>>>>>> functions due to limitations in JavaRDD used in spark
>>>>>>>>>>>
>>>>>>>>>>> e.g. -
>>>>>>>>>>> Fill operation - when filling with values from rows above and
>>>>>>>>>>> below
>>>>>>>>>>> Fold operation
>>>>>>>>>>>
>>>>>>>>>>> It would be great if I could get to know which wrangler
>>>>>>>>>>> operations are important for a user of the ML
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Danula
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>
>>>>>>>>>>>> Please send an update of your work thus far.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly
>>>>>>>>>>>>> generating spark transformations using Wrangler and come up with 
>>>>>>>>>>>>> a summary ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> contribute to WSO2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have already completed my primary research on wrangler and
>>>>>>>>>>>>>> would like to meet you to get feedback on the proposed 
>>>>>>>>>>>>>> architecture. I am
>>>>>>>>>>>>>> planning to start working on the project before 25th of May.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>
>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>
>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>> Software Engineer
>>>>>>>>>> WSO2, Inc.
>>>>>>>>>> http://wso2.com/
>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to