Re: ASF Board Report for Fluo - Initial Reminder for August 2017

Christopher Wed, 09 Aug 2017 10:34:56 -0700

On Wed, Aug 9, 2017 at 10:41 AM Keith Turner <[email protected]> wrote:


> On Tue, Aug 8, 2017 at 7:36 PM, Christopher <[email protected]> wrote:
> > On Tue, Aug 8, 2017 at 5:47 PM Keith Turner <[email protected]> wrote:
> >>
> >> For reference I used the following to collect stats
> >>
> >>
> >>
> >>
> https://github.com/issues?utf8=%E2%9C%93&q=is%3Aissue+repo%3Aapache%2Ffluo+repo%3Aapache%2Ffluo-recipes+repo%3Aapache%2Ffluo-website+closed%3A%3E%3D2017-05-01+
> >>
> >>
> https://github.com/issues?utf8=%E2%9C%93&q=is%3Aissue+repo%3Aapache%2Ffluo+repo%3Aapache%2Ffluo-recipes+repo%3Aapache%2Ffluo-website+created%3A%3E%3D2017-05-01+
> >>
> >>
> https://github.com/issues?utf8=%E2%9C%93&q=is%3Apr+repo%3Aapache%2Ffluo+repo%3Aapache%2Ffluo-recipes+repo%3Aapache%2Ffluo-website+closed%3A%3E%3D2017-05-01+
> >>
> >>
> https://github.com/issues?utf8=%E2%9C%93&q=is%3Apr+repo%3Aapache%2Ffluo+repo%3Aapache%2Ffluo-recipes+repo%3Aapache%2Ffluo-website+created%3A%3E%3D2017-05-01+
> >> git log --since 2017-05-01 | grep Author | sort -u
> >>
> >> On Tue, Aug 8, 2017 at 5:46 PM, Keith Turner <[email protected]> wrote:
> >> > Here is a draft of the report
> >> >
> >> > ## Description:
> >> >
> >> >  - Apache Fluo is an open source implementation of Percolator (which
> >> > populates
> >> >    Google's search index) for Apache Accumulo. With Fluo, users can
> >> > continuously
> >> >    join new data into large existing data sets without reprocessing
> all
> >> > data.
> >> >    Unlike batch and streaming frameworks, Fluo offers much lower
> latency
> >> > and can
> >> >    operate on extremely large data sets.
> >
> >
> > I'm not a big fan of this description. There's some elements of debatable
> > accuracy and relevancy. I suggest dropping the history/background and
>
> What do you think is inaccurate?
>
>
It's not that it's inaccurate... it's that it's temporal. Saying that it
offers "much lower latency" compared to others is not likely to be a
resilient statement as advances in those other frameworks progress. It may
be true... but it could not be true at any given moment. Same with "can
operate on extremely large data sets"... it's not necessarily accurate to
say this is "unlike" others, because others may be able to do that, too, at
any given moment.

More importantly, I think, is whether or not these are relevant. Our docs
can explain more details about history, background, prior technologies, and
comparisons with other things. Those things should not distract from
answering "What is *this* in front of me?" question. I especially think
this or the board report... the board isn't going to want to see an
explanation of what Percolator is, or an explanation of limitations in
other software... they just want a reminder of what this particular Apache
project is about.


>
> > comparison stuff, and using a more "get to the point" description which
> lets
> > users know why they should care about Fluo, like:
> >
> > Apache Fluo is a distributed processing system which allows users to
> > continuously and incrementally join new data with extremely large
> existing
> > data sets without reprocessing all the data. It provides low-latency,
> > high-throughput data processing into Apache Accumulo, with cross-node
> > transaction support and a notification system to automatically process
> new
> > data with a user-defined workflow.
>
>
> Thinking about this some more base on your feedback, I think there are
> three different aspects that are important to communicate to someone
> trying to quickly decide if they should look into this further.
>
>  1 . What capabilities it offers to users
>  2. How it works
>  3. Context, how does it compare to other big data technologies.
>
> Below is a rough outline of an attempt to communicate these three aspects.
>
> Intro :
>
>   Apache Fluo is a distributed data processing system built on Apache
> Accumulo.
>
> Capabilities :
>
>   Fluo allows users to continuously join new data into large existing
> data sets without reprocessing all data.  With Fluo, users can keep
> multiple dependent derived data sets up to date as new data arrives.
> Changes to derived data sets can be emitted to external query or
> analytics systems.
>
> How it works :
>
>   Fluo achieves this by offering the ability to execute user defined
> cross node transactions when data changes.
>
> Context :
>
>   Fluo offers much lower latency than batch frameworks and can operate
> on larger data sets than streaming frameworks.
>
>
I like the way you broke it down, but I think the result is too long for a
description. I don't think the "Context" portion should be included. The
"Capabilities" section reads like a follow-up paragraph, which begins going
into more details.
(also, I like my wording better. ;)

I do think the way you've broken it down would be great for the front page
of Fluo's website, though, in this sectioned format. But, the description
should be just a "taste", so the smaller and more to the point it is, the
better.

In any case, for the purposes of this report, please select whatever
description you think best. We can continue to discuss how best to present
Fluo to others through improved description, etc., separately from the
board report.


>
> >
> >
> >>
> >> >
> >> > ## Issues:
> >> >
> >> >  - There are no issues requiring board attention at this time.
> >> >
> >> > ## Activity:
> >> >
> >> >  - Released versions of Fluo are tightly coupled with YARN+Twill for
> >> > launching
> >> >    services on a cluster.  Work is currently under way to break this
> >> > tight
> >> >    coupling inorder to support YARN+Twill, Mesos, and Kubernetes.
> >> >  - Fluo implements an immutable byte array wrapper in its API.  Work
> on
> >> > moving
> >> >    this to its own sub-project is underway.  The goal of this is to
> >> > create
> >> >    something analogous to String for bytes that is suitable for use in
> >> > other APIs.
> >> >    This goal was discussed on an OpenJDK list and there was agreement
> >> > Java needs a
> >> >    project like this until Java defines a bigger story for
> immutability.
> >> >
> >
> >
> > Could mention that we're still working through INFRA tasks for graduation
> > and are mostly transitioned.
> > (
> https://issues.apache.org/jira/servicedesk/agent/INFRA/issue/INFRA-14714)
> >
> >
> >>
> >> > ## Health report:
> >> >
> >> >  - Within the past three months
> >> >     - 46 issues were opened and 23 closed
> >> >     - 64 pull request were opened and 56 were closed
> >> >     - 69 commits were made by 6 authors, 2 authors were not committers
> >> >
> >> > Contributions from non-committers.  GH stats.  35 PRs in last 3
> >> > months.. 31 closed.
> >> >
> >> > ## PMC changes:
> >> >
> >> >  - Currently 8 PMC members.
> >> >  - Chris McTague was added to the PPMC on --- shortly before
> graduation.
> >> >
> >
> >
> > Chris accepted membership on 2017-05-16, and had his ICLA filed and
> account
> > created on 2017-05-31.
> > Should mention explicitly that this happened after our last/final
> Incubator
> > report, and not just before graduation.
> >
> >>
> >> > ## Committer base changes:
> >> >
> >> >  - Currently 8 committers.
> >> >  - Chris McTague was added as a committer --- shortly before
> graduation.
> >> >
> >
> >
> > Would be good to clarify that PMC == committer, at least for now. This
> is a
> > common question from the board.
> >
> >>
> >> > ## Releases:
> >> >
> >> >  - fluo-1.1.0-incubating was released on Mon Jun 12 2017
> >> >  - fluo-recipes-1.1.0-incubating was released on Thu Jun 22 2017
> >> >
> >> > On Mon, Aug 7, 2017 at 5:19 PM, Christopher <[email protected]>
> wrote:
> >> >> I think INFRA tasks are done.
> >> >>
> >> >> On Wed, Aug 2, 2017 at 10:52 AM Keith Turner <[email protected]>
> wrote:
> >> >>>
> >> >>> I started working on this and thinking about it a bit.
> >> >>>
> >> >>> I added our release dates at https://reporter.apache.org/
> >> >>>
> >> >>> Currently thinking of mentioning the following that has happened
> since
> >> >>> our last report to IPMC.  Thinking of linking to our last IPMC
> report.
> >> >>>
> >> >>>  * Our new member Chris
> >> >>>  * Our new releases
> >> >>>  * We are waiting on INFRA, link to issues... maybe this will change
> >> >>> before its due next Wed
> >> >>>  * Discussing the new work to support multiple ways to launch Fluo
> and
> >> >>> deprecating our tight coupling to YARN in core.
> >> >>>
> >> >>> On Fri, Jul 28, 2017 at 7:59 AM, Brett Porter <[email protected]>
> >> >>> wrote:
> >> >>> > This email was sent on behalf of the ASF Board.  It is an initial
> >> >>> > reminder to
> >> >>> > give you plenty of time to prepare the report.
> >> >>> >
> >> >>> > According to board records, you are listed as the chair of a
> >> >>> > committee
> >> >>> > that is
> >> >>> > due to submit a report this month. [1] [2]
> >> >>> >
> >> >>> > The meeting is scheduled for Wed, 16 Aug 2017 at 10:30 PDT and the
> >> >>> > deadline for
> >> >>> > submitting your report is 1 full week prior to that (Wed Aug 9th)!
> >> >>> >
> >> >>> > Meeting times in other time zones:
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> http://www.timeanddate.com/worldclock/fixedtime.html?iso=2017-08-16T10:30:00&msg=ASF+Board+Meeting&p1=137
> >> >>> >
> >> >>> > Please submit your report with sufficient time to allow the board
> >> >>> > members
> >> >>> > to review and digest. Again, the very latest you should submit
> your
> >> >>> > report
> >> >>> > is 1 full week (7days) prior to the board meeting (Wed Aug 9th).
> >> >>> >
> >> >>> > If you feel that an error has been made, please consult [1] and if
> >> >>> > there
> >> >>> > is still an issue then contact the board directly.
> >> >>> >
> >> >>> > As always, PMC chairs are welcome to attend the board meeting.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > The ASF Board
> >> >>> >
> >> >>> > [1] -
> >> >>> >
> >> >>> >
> https://svn.apache.org/repos/private/committers/board/committee-info.txt
> >> >>> > [2] -
> >> >>> >
> https://svn.apache.org/repos/private/committers/board/calendar.txt
> >> >>> > [3] -
> >> >>> > https://svn.apache.org/repos/private/committers/board/templates
> >> >>> > [4] - https://reporter.apache.org/
> >> >>> >
> >> >>> >
> >> >>> > Submitting your Report
> >> >>> > ----------------------
> >> >>> >
> >> >>> > Full details about the process and schedule are in [1].
> >> >>> >
> >> >>> > The report should be committed to the meeting agenda in the board
> >> >>> > directory
> >> >>> > in the foundation repository, trying to keep a similar format to
> the
> >> >>> > others.
> >> >>> > This can be found at:
> >> >>> >
> >> >>> >   https://svn.apache.org/repos/private/foundation/board
> >> >>> >
> >> >>> > Reports can also be posted using the online agenda tool:
> >> >>> >
> >> >>> >   https://whimsy.apache.org/board/agenda/2017-08-16/Fluo
> >> >>> >
> >> >>> > Your report should also be sent in plain-text format to
> >> >>> > [email protected]
> >> >>> > with a Subject line that follows the below format:
> >> >>> >
> >> >>> >   Subject: [REPORT] Fluo - August 2017
> >> >>> >
> >> >>> > Cutting and pasting directly from a Wiki is not acceptable due to
> >> >>> > formatting
> >> >>> > issues. Line lengths should be limited to 77 characters.
> >> >>> >
> >> >>> > Chairs may use the Apache Reporter Service [4] to help them
> compile
> >> >>> > and
> >> >>> > submit a board report.
> >> >>> >
> >> >>> >
> >> >>> > Resolutions
> >> >>> > -----------
> >> >>> >
> >> >>> > There are several templates for use for various Board resolutions.
> >> >>> > They can be found in [3] and you are encouraged to use them. It is
> >> >>> > strongly recommended that if you have a resolution before the
> board,
> >> >>> > you are encouraged to attend that board meeting.
>

Re: ASF Board Report for Fluo - Initial Reminder for August 2017

Reply via email to