Hi, Here is the initial proposal. Please go through it and you can comment right on the document. Regarding the discussions around Dimensional operators, there is a specific section for it and future plans. After the comments are addressed, I can start with one of the components such as flume and document the steps involved. Then others can take up the other components and use the steps in a similar fashion.
https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5nTP7cUQOAlVs0g Thanks On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote: > Thomas, > IMHO we should also look at the cost to users on keeping code in a github > (even if under ASF 2.0 license) outside Malhar. There is value to > deprecating code in Megh, and moving it to Malhar. Volunteers in this > effort could decide on how much overlap means "mark as overlapping", My > suggesstion is to absorb overlapping operators into a directory in Malhar > that marks it as such. A lot of these operators are being used in > production and it make sense to absorb them into Apache gitHub. > > Thks > Amol > > > > > On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <pra...@datatorrent.com> > wrote: > > > It would be great to have Tim's help with dimension computation but I > > think we can still debate whether HDHT dependency needs to be removed > > before contribution or whether it can be done as a two step process > > since we also have a place to put experimental code contrib and HDHT > > could go in there till we can determine/port it to use managed. state. > > > > My thought on this is that if it is going to be a significant porting > > effort then we do it as a two step process. > > > > Thanks > > > > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <tho...@datatorrent.com> > > wrote: > > > > > > Tim, > > > > > > The functionality of the dimension compute operator should be available > > in > > > Malhar. My concern is moving things without regard to code duplication > > and > > > long term maintenance cost. There are several pieces to the dimension > > > compute operator that in fact are (or should be) reusable components by > > > themselves. Live querying (queryable state) with schemas is one such > > > example. It's a major feature and not limited to the dimension compute > > > operator. It should ideally work with the new windowing support as > well. > > > But the main area that needs work is the state store - the dependency > on > > > HDHT needs to be removed and replaced with managed state. Also I'm > > curious > > > why the window operator should not scale for large time buckets? Are > you > > > referring to the current intermediate implementation or the work in > > > progress that will use incremental state saving? If so, please bring it > > up > > > on APEXMALHAR-2130 as it is pretty important. > > > > > > Since you have written almost all of the dimension compute code, could > > you > > > help with the changes needed to bring it over? It would also be good to > > see > > > the user documentation in Malhar. > > > > > > Thanks, > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas < > > timothyfar...@apache.org> > > > wrote: > > > > > >> Hi Thomas, > > >> > > >> With respect to the dimension operator, I would like to learn more > about > > >> the underlying framework you mentioned and the code duplication. If > you > > are > > >> talking about the Window operator framework, that framework is not > > suitable > > >> for the dimension computation use case because it doesn't scale for > > large > > >> timebuckets. Furthermore that framework has no support for Querying. > The > > >> dimension operators support live queries of the aggregated data. > > Querying > > >> of live data streams is a popular feature in other open source > > platforms, > > >> and I believe it is a worthwhile addition to Malhar. > > >> > > >> Given the fact that the dimension framework has been used in many POCs > > and > > >> is even running in production and has novel features like live > > querying, it > > >> more than meets the bar for a malhar contribution. If a concrete > > argument > > >> cannot be provided to prevent this work from going into Malhar, then > > these > > >> efforts should not be blocked. > > >> > > >> Thanks, > > >> Tim > > >> > > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <tho...@datatorrent.com> > > wrote: > > >>> I see no reason to move the dimension operator along with everything > it > > >>> duplicates to Malhar. It's available to use for everyone as it is and > > >> there > > >>> should be an initiative to make it confirm to the underlying > framework > > to > > >>> be part of Malhar. > > >>> > > >>> Also there is already an enrichment operator, there is even > > documentation > > >>> for it. > > >>> > > >>> Hence, this needs to be analyzed properly. > > >>> > > >>> Thomas > > >>> > > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni < > > pra...@datatorrent.com> > > >>> wrote: > > >>> > > >>>> Yes, I do plan to come up with a proposal with a list. The ones that > > >> come > > >>>> to mind are flume, enrichment, various dimensional operators and any > > >> custom > > >>>> partitioners. The dimensional operators are in a mature state and > > >> usable > > >>>> today, in future they could also be ported onto the new windowing > and > > >>>> managed state operator framework. > > >>>> > > >>>> Thanks > > >>>> > > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise < > tho...@datatorrent.com> > > >>>> wrote: > > >>>> > > >>>>> A cursory look suggests there is a lot of overlap. I'm looking > > >> forward to > > >>>>> see a proposal that reflects a vision how to evolve Malhar rather > > >> than > > >>>> just > > >>>>> moving around code. > > >>>>> > > >>>>> Thomas > > >>>>> > > >>>>> > > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni < > > >> pra...@datatorrent.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> DataTorrent, the initial contributor to Apex and the company I > work > > >>>> for, > > >>>>>> has opened up a library of operators called Megh recently to the > > >> public > > >>>>> and > > >>>>>> has made the repository available under the Apache License. The > > >> link to > > >>>>> the > > >>>>>> repository is below. These operators, for the most part, contain > > >>>>>> functionality that is complementary to what Malhar library > > >> provides and > > >>>>>> were developed to solve business use cases that arose over time. > > >> Also, > > >>>>> some > > >>>>>> operators in Malhar were inspired from early implementations in > the > > >>>> Megh > > >>>>>> library and were built upon knowledge gained in doing the original > > >>>>>> implementations. > > >>>>>> > > >>>>>> Our goal is to not have Megh as a separate library but rather > bring > > >>>> these > > >>>>>> operators into Malhar in a fashion that it is consistent with the > > >>>> Malhar > > >>>>>> project and repository. In the upcoming days, in a gradual > > >> fashion, we > > >>>>> will > > >>>>>> have more details on the individual operators that we would like > to > > >>>>>> contribute. Also, if you are interested in helping with this > effort > > >>>>> please > > >>>>>> raise your hand. > > >>>>>> > > >>>>>> https://github.com/DataTorrent/Megh/ > > >>>>>> > > >>>>>> Thanks > > >> > > >