Hi,

Here is the initial proposal. Please go through it and you can comment
right on the document. Regarding the discussions around Dimensional
operators, there is a specific section for it and future plans. After the
comments are addressed, I can start with one of the components such as
flume and document the steps involved. Then others can take up the other
components and use the steps in a similar fashion.

https://docs.google.com/document/d/1BzWAwJDEUs0G42DWTuGYvM5sm0Uu5nTP7cUQOAlVs0g

Thanks

On Sat, Sep 10, 2016 at 10:29 AM, Amol Kekre <a...@datatorrent.com> wrote:

> Thomas,
> IMHO we should also look at the cost to users on keeping code in a github
> (even if under ASF 2.0 license) outside Malhar. There is value to
> deprecating code in Megh, and moving it to Malhar. Volunteers in this
> effort could decide on how much overlap means "mark as overlapping", My
> suggesstion is to absorb overlapping operators into a directory in Malhar
> that marks it as such. A lot of these operators are being used in
> production and it make sense to absorb them into Apache gitHub.
>
> Thks
> Amol
>
>
>
>
> On Sat, Sep 10, 2016 at 7:20 AM, Pramod Immaneni <pra...@datatorrent.com>
> wrote:
>
> > It would be great to have Tim's help with dimension computation but I
> > think we can still debate whether HDHT dependency needs to be removed
> > before contribution or whether it can be done as a two step process
> > since we also have a place to put experimental code contrib and HDHT
> > could go in there till we can determine/port it to use managed. state.
> >
> > My thought on this is that if it is going to be a significant porting
> > effort then we do it as a two step process.
> >
> > Thanks
> >
> > > On Sep 9, 2016, at 11:52 PM, Thomas Weise <tho...@datatorrent.com>
> > wrote:
> > >
> > > Tim,
> > >
> > > The functionality of the dimension compute operator should be available
> > in
> > > Malhar. My concern is moving things without regard to code duplication
> > and
> > > long term maintenance cost. There are several pieces to the dimension
> > > compute operator that in fact are (or should be) reusable components by
> > > themselves. Live querying (queryable state) with schemas is one such
> > > example. It's a major feature and not limited to the dimension compute
> > > operator. It should ideally work with the new windowing support as
> well.
> > > But the main area that needs work is the state store - the dependency
> on
> > > HDHT needs to be removed and replaced with managed state. Also I'm
> > curious
> > > why the window operator should not scale for large time buckets? Are
> you
> > > referring to the current intermediate implementation or the work in
> > > progress that will use incremental state saving? If so, please bring it
> > up
> > > on APEXMALHAR-2130 as it is pretty important.
> > >
> > > Since you have written almost all of the dimension compute code, could
> > you
> > > help with the changes needed to bring it over? It would also be good to
> > see
> > > the user documentation in Malhar.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 9, 2016 at 10:52 PM, Timothy Farkas <
> > timothyfar...@apache.org>
> > > wrote:
> > >
> > >> Hi Thomas,
> > >>
> > >> With respect to the dimension operator, I would like to learn more
> about
> > >> the underlying framework you mentioned and the code duplication. If
> you
> > are
> > >> talking about the Window operator framework, that framework is not
> > suitable
> > >> for the dimension computation use case because it doesn't scale for
> > large
> > >> timebuckets. Furthermore that framework has no support for Querying.
> The
> > >> dimension operators support live queries of the aggregated data.
> > Querying
> > >> of live data streams is a popular feature in other open source
> > platforms,
> > >> and I believe it is a worthwhile addition to Malhar.
> > >>
> > >> Given the fact that the dimension framework has been used in many POCs
> > and
> > >> is even running in production and has novel features like live
> > querying, it
> > >> more than meets the bar for a malhar contribution. If a concrete
> > argument
> > >> cannot be provided to prevent this work from going into Malhar, then
> > these
> > >> efforts should not be blocked.
> > >>
> > >> Thanks,
> > >> Tim
> > >>
> > >>> On 2016-09-09 17:18 (-0700), Thomas Weise <tho...@datatorrent.com>
> > wrote:
> > >>> I see no reason to move the dimension operator along with everything
> it
> > >>> duplicates to Malhar. It's available to use for everyone as it is and
> > >> there
> > >>> should be an initiative to make it confirm to the underlying
> framework
> > to
> > >>> be part of Malhar.
> > >>>
> > >>> Also there is already an enrichment operator, there is even
> > documentation
> > >>> for it.
> > >>>
> > >>> Hence, this needs to be analyzed properly.
> > >>>
> > >>> Thomas
> > >>>
> > >>> On Fri, Sep 9, 2016 at 5:10 PM, Pramod Immaneni <
> > pra...@datatorrent.com>
> > >>> wrote:
> > >>>
> > >>>> Yes, I do plan to come up with a proposal with a list. The ones that
> > >> come
> > >>>> to mind are flume, enrichment, various dimensional operators and any
> > >> custom
> > >>>> partitioners. The dimensional operators are in a mature state and
> > >> usable
> > >>>> today, in future they could also be ported onto the new windowing
> and
> > >>>> managed state operator framework.
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>> On Fri, Sep 9, 2016 at 4:29 PM, Thomas Weise <
> tho...@datatorrent.com>
> > >>>> wrote:
> > >>>>
> > >>>>> A cursory look suggests there is a lot of overlap. I'm looking
> > >> forward to
> > >>>>> see a proposal that reflects a vision how to evolve Malhar rather
> > >> than
> > >>>> just
> > >>>>> moving around code.
> > >>>>>
> > >>>>> Thomas
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Sep 8, 2016 at 2:40 PM, Pramod Immaneni <
> > >> pra...@datatorrent.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> DataTorrent, the initial contributor to Apex and the company I
> work
> > >>>> for,
> > >>>>>> has opened up a library of operators called Megh recently to the
> > >> public
> > >>>>> and
> > >>>>>> has made the repository available under the Apache License. The
> > >> link to
> > >>>>> the
> > >>>>>> repository is below. These operators, for the most part, contain
> > >>>>>> functionality that is complementary to what Malhar library
> > >> provides and
> > >>>>>> were developed to solve business use cases that arose over time.
> > >> Also,
> > >>>>> some
> > >>>>>> operators in Malhar were inspired from early implementations in
> the
> > >>>> Megh
> > >>>>>> library and were built upon knowledge gained in doing the original
> > >>>>>> implementations.
> > >>>>>>
> > >>>>>> Our goal is to not have Megh as a separate library but rather
> bring
> > >>>> these
> > >>>>>> operators into Malhar in a fashion that it is consistent with the
> > >>>> Malhar
> > >>>>>> project and repository. In the upcoming days, in a gradual
> > >> fashion, we
> > >>>>> will
> > >>>>>> have more details on the individual operators that we would like
> to
> > >>>>>> contribute. Also, if you are interested in helping with this
> effort
> > >>>>> please
> > >>>>>> raise your hand.
> > >>>>>>
> > >>>>>> https://github.com/DataTorrent/Megh/
> > >>>>>>
> > >>>>>> Thanks
> > >>
> >
>

Reply via email to