Hi Danula, Did we have a review for the work done so far? If not, shall we have a one? We can clear out any doubts and issues as well..
Thanks, Supun On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <[email protected]> wrote: > Hi Danula, > > Thanks for the update, keep them coming. > > On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes, this > is costly, since it would load whole dataset into memory. So, is this an > operation which involves multiple rows? > > On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <[email protected]> > wrote: > >> Hi Supun, >> >> I modified the "Fill" operation to add what you mentioned. >> >> I used a workaround to to implement certain parts of the operations such >> as filling with values from rows above and below. >> I created a List Implementation using toArray() method in JavaRDD and >> then converted it back to a JavaRDD after the operation. >> >> This will be inefficient (in terms of both memory and time) when working >> with very large data sets. But I think its important to have these features >> included. Otherwise a user would be left with very limited set of >> operations. >> >> Please let me know if you have a different opinion on this. >> >> Thanks, >> Danula >> >> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <[email protected]> wrote: >> >>> Somehow there are issues in implementing certain wrangler functions due >>>> to limitations in JavaRDD used in spark >>>> e.g. - >>>> Fill operation - when filling with values from rows above and below >>>> Fold operation >>> >>> >>> Agree, since rows will get executed randomly with spark, inter-row >>> operations are not very meaningful. >>> But you can slightly modify the implementation of the "Fill" operation, >>> such as, to fill values based on an expression/static-value/mean etc. (not >>> depending on other rows).. >>> >>> Thanks, >>> Supun >>> >>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <[email protected]> wrote: >>> >>>> Hi Danula, >>>> >>>> Sorry for the late reply. Have you got the details you were looking for? >>>> >>>> It would be great if I could get to know which wrangler operations are >>>>> important for a user of the ML >>>> >>>> >>>> Other than the ones you have mentioned in the proposal, think its >>>> better to have "Translate" operation as well (to create a new column >>>> based on an existing column). >>>> >>>> Thanks, >>>> Supun >>>> >>>> >>>> >>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am currently working on generating spark transformations related to >>>>> the operations available in the data wrangler. >>>>> >>>>> Data wrangler provides sufficient parameters to re-create these at >>>>> spark.I have successfully implemented delete and split operations of >>>>> wrangler in spark. >>>>> >>>>> Once this phase is completed, I can either directly generate these >>>>> scripts at wrangler or use the javascript output and convert it to spark >>>>> depending on the implementation. >>>>> >>>>> Somehow there are issues in implementing certain wrangler functions >>>>> due to limitations in JavaRDD used in spark >>>>> >>>>> e.g. - >>>>> Fill operation - when filling with values from rows above and below >>>>> Fold operation >>>>> >>>>> It would be great if I could get to know which wrangler operations are >>>>> important for a user of the ML >>>>> >>>>> Thanks, >>>>> Danula >>>>> >>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Danula, >>>>>> >>>>>> Please send an update of your work thus far. >>>>>> >>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Danula, >>>>>>> >>>>>>> Welcome to GSoC 15' ! Can you do some research on directly >>>>>>> generating spark transformations using Wrangler and come up with a >>>>>>> summary ? >>>>>>> >>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Thank you for selecting my proposal [1] >>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing> >>>>>>>> for GSoC 2015. I am really looking forward to work with you all and >>>>>>>> contribute to WSO2. >>>>>>>> >>>>>>>> I have already completed my primary research on wrangler and would >>>>>>>> like to meet you to get feedback on the proposed architecture. I am >>>>>>>> planning to start working on the project before 25th of May. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Danula >>>>>>>> >>>>>>>> [1] - >>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thanks & regards, >>>>>>> Nirmal >>>>>>> >>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>> Mobile: +94715779733 >>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Thanks & regards, >>>>>> Nirmal >>>>>> >>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>> Mobile: +94715779733 >>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> >> >> > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
