Hello.

Le mer. 2 févr. 2022 à 10:47, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> > Hello.
> >
> > Le jeu. 27 janv. 2022 à 18:09, Alex Herbert <alex.d.herb...@gmail.com> a 
> > écrit :
> > >
> > > I would be willing to go through GSOC again.
> >
> > Thanks; I know that back in 2020, it had been a disproportionate
> > amount of work...
> >
> > > I think that the
> > > statistics component could again serve as a project. There are some
> > > packages in Math that could be moved to make use of the updated
> > > distributions (e.g. math.stat.inference)
> >
> > That would be great, although I seem to notice that there
> > might be some dependency issues...
> >
> > > or perhaps a reworking of the
> > > math.stat.descriptive package to support using them with streams.
> >
> > +1
> >
> > > In the last iteration (GSOC 2020) we failed to get enough of a picture
> > > of the competence of candidates in the 'bonding phase' before places
> > > were formally allocated. I think we should require that a candidate
> > > can:
> > >
> > > - Open a PR on GitHub to add a feature in the topic area. It should be
> > > of non-trivial complexity and delivered to a quality ready to merge.
> >
> > Do you think that the above "stream support" could be that task?
>
> Yes. A simple class to compute a summary statistic such as:
>
> public interface Statistic<R> {
>     void add(R x);
> }
> public interface DoubleStatistic<R> extends Statistic<R>,
> DoubleConsumer, DoubleSupplier {
>     // Composite interface
> }
>
> public Mean implements DoubleStatistic<Mean> {
>   static Mean create();
>   // Overrides
>   public void accept(double x);
>   public void add(Mean m);
>   public double getAsDouble();
> }
>
> Used as:
>
> DoubleStream s;
> double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();

To simplify the above, would we also provide
---CUT---
public Mean ... {
  // ...
  public static double collect(DoubleStream s) {
    return s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();
  }
}
---CUT---

>
> The implementation(s) can be updated and expanded later using
> different underlying algorithms (simple sum, extended precision sum,
> rolling mean) by passing a choice to the create method.
>
> The project will involve how to move from this simple statistic to
> supporting IntStream, LongStream, DoubleStream as appropriate and
> allow combining statistics efficiently to obtain a customised summary
> statistic, perhaps by enum.
>
> This is for the StorelessUnivariateStatistic in Commons Math. A more
> detailed examination of the existing functionality would be required
> and use cases generated for each to understand how these can be
> supported in streams.

This study could be indeed started in the "bonding" period and will
fairly clearly indicate the candidate's potential.

> >
> > > - Show knowledge of the topic area beyond this single feature,
> > > demonstrating ability to continue to significantly contribute through
> > > a 3 month period in the subject area.
> >
> > That seems more fuzzy to define and assess (?).
>
> I agree; choosing candidates is a fuzzy area. This was meant to
> summarise my understanding of how we chose candidates last time. It is
> based on their proposal submitted to GSOC but also impressions from
> the bonding period.

As you noted in your post-GSoc 2020 suggestions, the issue
stemmed from not having a concrete way to evaluate the bonding
period.
This should be solved (for "[Statistics]") with your proposal above.

I'd be glad to get help with defining concrete tasks for the ideas
below. :-)

> >
> > Some ideas (for "Commons Math"):
> > 1. Redesign and modularization of the "ml" package
> >   -> main goal: enable multi-thread usage
> > 2. Abstracting the linear algebra utilities
> >   -> main goal: allow (runtime?) switch to alternative implementations
> > 3. Redesign and modularization of the "random" package
> >   -> main goal: general support of low-discrepancy sequences
> > 4. Refactoring and modularization of the "special" package
> >  -> main goal: ensure accuracy and performance and better API,
> >      add other functions (?).
> >
> > > Without this set of skills there will be little progress in the formal
> > > code period.
> >
> > :-}
> >
> > Shall we open a "GSoC 2022" report in each concerned JIRA project?
>
> Yes. I think we just create some tickets and tag them with the
> appropriate tag (GSOC 2022 ?). There should be some left over from
> last time to repurpose or use as templates for new ones.

Actually, I was thinking of creating one global "GSoC 2022" issue
in each component, that would list all the topics and a complete
description of their respective goal, and then sub-tasks (or linked
issues) for more specific discussions (once the topic is taken on
by at least one candidate).
I mean that we should separate the JIRA "new feature" report from
the report that tracks GSoC activity.  That way, we will be to close
the GSoC ticket when the time comes, and resolve, or not, the
"feature" ticket.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to