It would be great to have Tim's help with dimension computation but I think we can still debate whether HDHT dependency needs to be removed before contribution or whether it can be done as a two step process since we also have a place to put experimental code contrib and HDHT could go in there till we can determine/port it to use managed. state.
My thought on this is that if it is going to be a significant porting effort then we do it as a two step process. Thanks > On Sep 9, 2016, at 11:52 PM, Thomas Weise <[email protected]> wrote: > > Tim, > > The functionality of the dimension compute operator should be available in > Malhar. My concern is moving things without regard to code duplication and > long term maintenance cost. There are several pieces to the dimension > compute operator that in fact are (or should be) reusable components by > themselves. Live querying (queryable state) with schemas is one such > example. It's a major feature and not limited to the dimension compute > operator. It should ideally work with the new windowing support as well. > But the main area that needs work is the state store - the dependency on > HDHT needs to be removed and replaced with managed state. Also I'm curious > why the window operator should not scale for large time buckets? Are you > referring to the current intermediate implementation or the work in > progress that will use incremental state saving? If so, please bring it up > on APEXMALHAR-2130 as it is pretty important. > > Since you have written almost all of the dimension compute code, could you > help with the changes needed to bring it over? It would also be good to see > the user documentation in Malhar. > > Thanks, > Thomas > > > > > > > > > > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <[email protected]> > wrote: > >> Hi Thomas, >> >> With respect to the dimension operator, I would like to learn more about >> the underlying framework you mentioned and the code duplication. If you are >> talking about the Window operator framework, that framework is not suitable >> for the dimension computation use case because it doesn't scale for large >> timebuckets. Furthermore that framework has no support for Querying. The >> dimension operators support live queries of the aggregated data. Querying >> of live data streams is a popular feature in other open source platforms, >> and I believe it is a worthwhile addition to Malhar. >> >> Given the fact that the dimension framework has been used in many POCs and >> is even running in production and has novel features like live querying, it >> more than meets the bar for a malhar contribution. If a concrete argument >> cannot be provided to prevent this work from going into Malhar, then these >> efforts should not be blocked. >> >> Thanks, >> Tim >> >>> On 2016-09-09 17:18 (-0700), Thomas Weise <[email protected]> wrote: >>> I see no reason to move the dimension operator along with everything it >>> duplicates to Malhar. It's available to use for everyone as it is and >> there >>> should be an initiative to make it confirm to the underlying framework to >>> be part of Malhar. >>> >>> Also there is already an enrichment operator, there is even documentation >>> for it. >>> >>> Hence, this needs to be analyzed properly. >>> >>> Thomas >>> >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <[email protected]> >>> wrote: >>> >>>> Yes, I do plan to come up with a proposal with a list. The ones that >> come >>>> to mind are flume, enrichment, various dimensional operators and any >> custom >>>> partitioners. The dimensional operators are in a mature state and >> usable >>>> today, in future they could also be ported onto the new windowing and >>>> managed state operator framework. >>>> >>>> Thanks >>>> >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <[email protected]> >>>> wrote: >>>> >>>>> A cursory look suggests there is a lot of overlap. I'm looking >> forward to >>>>> see a proposal that reflects a vision how to evolve Malhar rather >> than >>>> just >>>>> moving around code. >>>>> >>>>> Thomas >>>>> >>>>> >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni < >> [email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> DataTorrent, the initial contributor to Apex and the company I work >>>> for, >>>>>> has opened up a library of operators called Megh recently to the >> public >>>>> and >>>>>> has made the repository available under the Apache License. The >> link to >>>>> the >>>>>> repository is below. These operators, for the most part, contain >>>>>> functionality that is complementary to what Malhar library >> provides and >>>>>> were developed to solve business use cases that arose over time. >> Also, >>>>> some >>>>>> operators in Malhar were inspired from early implementations in the >>>> Megh >>>>>> library and were built upon knowledge gained in doing the original >>>>>> implementations. >>>>>> >>>>>> Our goal is to not have Megh as a separate library but rather bring >>>> these >>>>>> operators into Malhar in a fashion that it is consistent with the >>>> Malhar >>>>>> project and repository. In the upcoming days, in a gradual >> fashion, we >>>>> will >>>>>> have more details on the individual operators that we would like to >>>>>> contribute. Also, if you are interested in helping with this effort >>>>> please >>>>>> raise your hand. >>>>>> >>>>>> https://github.com/DataTorrent/Megh/ >>>>>> >>>>>> Thanks >>
