Hi Gimhana.

On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:
Hii,

I have just shared my draft proposal for GSoC. Port Codes from Commons Math.

<https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit>

Wow; probably the first time that such a structured document
appears on this list. ;-)

Devs, would you please review it and I always welcome your precious
suggestions to improve it.

OK.  I'll try to provide some clarifications and words of
caution.

== "Background" section ==
Useful to cite:
(for Commons in general)
 * number of stable/active/dormant components
 * number of listed/active contributors
 * overview of topics covered
 * histogram of component's sizes (lines of code)
(for Commons Math)
 * how it fits within the above data

And draw some conclusions out of the comparison.
You stress "before JDK 1.8"; worth noting that some codes
dates back to before JDK 1.5!
Code age is not necessarily a problem per se, but the mix
(of designs linked to outdated JDK) is, IMHO, a development
nightmare.
Modularization can alleviate the unwanted consequences (such
as release stalled due to the lack of support).

== "Deliverables" section ==

Clarify what is meant by
 * "less dependencies" (an example?)
 * "Advanced mathematical functionalities": other than what
   exists now?  Or do you mean new interfaces (e.g. in
   accordance with the APIs provided by JDK8)?
 * "implemented module" (singular). I would assume that
   "Commons Statistics" will provide many modules.
 * "Guide for refactoring [..] Commons packages": That is
   unlikely. ;-)
   Did you more modestly mean "Commons Math packages"?
   You should perhaps note (in the "Background" section)
   that the task has been started two year ago (cf.
   "Commons RNG" and "Commons Numbers").

Another quite useful task is: set up the web site.

== "Implementation" section ==

 * "Design issues": list *actual* issues (see JIRA).
   Working with stream would better be described as an
   enhancement.
 * Describe "too many dependencies" (examples).
 * "Design goals": give concrete examples.

The class diagram is nice but I see a big issue with
the "matrix" functionality. [This was one of the reason
I wrote a few months ago (cf. ML archive) that the
refactoring of the "o.a.c.math4.stat" was not among the
low-hanging fruits of the refactoring.]
If ever possible, better start with functionality that
doesn't need the CM matrix code.

== "Results" section ==

Hope to get comment from PMC...
[Wish list, design requirements, mentor(s), etc.]

== "Future Development" section ==

AFAICT, porting "o.a.c.math4.geometry" will be much
easier and likely to be finished before "Commons
Statistics". :-}


Thanks for your interest,
Gilles

Best Regards,
Gimhana

On 17 March 2018 at 05:06, Gilles <gil...@harfang.homelinux.org> wrote:

Hi.

On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:

Hi devs,

Sorry for the delayed reply due to my academics.


If you want to start playing with the code, we could just begin
by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]


Should I create my own repo and start code in there?[Not in the forked
repo]


What's the difference?  IOW, someone else should answer. :-}

Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
guide me more. Like, to give me some minor issues in the current
implementation to solve or as a new feature implementation and gradually
we
can go for deeper


IMO, the top priority would be to release "Commons Numbers":
  http://commons.apache.org/proper/commons-numbers/

There are some blocking issues on JIRA:
  https://issues.apache.org/jira/projects/NUMBERS

and eventually I can go further my my own way.  Then I
can gradually familiar with the code and I think it is the most efficient way to learn the design architecture.[I spent hours to understand the current code basis and I felt that was not so efficient as I thought]


Refactoring the package "stat" is not straightforward...
However, to get to that, it would be useful to record your thoughts
as you browse through the code(s): what seems easy to port, what should
be changed/fixed, what you don't understand, and so on.


And if there is a format of Proposal regarding ASF ?


I don't think so.  This ML is the forum where project directions
are discussed.

If not what should I
mention in the proposal basically?


This can be a work in progress, I think (see above suggestions).

Best regards,
Gilles



Best Regards,




On 14 March 2018 at 19:07, Gilles <gil...@harfang.homelinux.org> wrote:

Hi.

On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:

Hello Devs,

Thanks Gilles and Eric for guidance.

I have cloned the Commons repos and forked the Common's Stat repo. Is it
possible to make pull requests to that repo to be reviewed?


That's certainly possible, but I'm afraid that it will become
quite unwieldy from my side if I have to delete/create branches
for every PR.

If you want to start playing with the code, we could just begin
by having discussions here (on design) and on JIRA (for processing
minor issues) based on the current state of your repository.
[What's the link to look it up?]

Or should I

follow a specific method?


I'll inquire about a more efficient method (than the above)...

By referring the API docs I got some idea of the separation of modules.


In the current Commons's stat repo there are some classes under the package distribution. I think those can be refactored using java 8 in
build statistics functionalities. Please correct me if I wrong.


An example perhaps?

As Eric said separation of function and streaming implementations is good

idea as designing. (In my point of view, it means method overloading ->
Again correct me if I didn't understand your fact correctly)


?

And I will share my draft proposal here for your review soon.



OK.

Thanks again for your interest,
Gilles



Best Regards.

On 13 March 2018 at 20:50, Gilles <gil...@harfang.homelinux.org> wrote:

Hello.


On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:

On Tue, Mar 13, 2018 at 12:47 AM, Gilles <gil...@harfang.homelinux.org
>

wrote:



Where can we find the old code before port into new Commons
components?



The code bases are managed by the "git" software; the whole history
is

available:
https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log

[I'd advise to "clone" the repositories on your local computer, and
use the command line tools.]



I believe you will want to clone the commons-math repositories, but
then
develop your own "fork" of the commons-statistics repository. Gilles
can
correct me if that is wrong.


Actually, I know only my workflow:
 $ git clone ...
 $ git branch ...
 $ git commit ...
 $ git push

:-}

I didn't find it very easy to cooperate with developers who
fork on GitHub and submit PRs.
I've now found the "git" command that creates a branch from
a PR, but it would be so much more comfortable to just switch
directory and do "git pull".

In the context of GSoC, would it be possible to grant some
privilege to non-committers so that they can update a selected
"git" repository?
If not, what is the next easiest way to share a "common space"
(aka "sandbox") from which it would be easy to copy reviewed
bits over to the official source repository?


As


you mentioned it will be a good approach to redesign process.



You don't necessarily need to analyze how the code was before

the port/refactoring; looking at how it is now is sufficient,
unless you suspect that something is wrong now and might have
been better before. ;-)


In particular, the statistics library was designed before Java 8.
Java

8
however has provided both efficient programming strategies for these statistical methods (in the form of lambdas and streams) as well as
some
built-in methods providing summary statistics functions (see
discussion
at
http://markmail.org/message/7t2mjaprsuvb3waj).


Very good point, indeed.
IMO, the new component should be targeted Java 8.
Even Java 9 (enforcing modularity with JPMS): if by the time we think of releasing the code, we still want to avoid "multi-release" JARs it will be easy to just remove the "module-info" files (I don't think much
else Java 9 specific would used by "Commons Statistics").

In fact, given the very slow pace at which new components are being brought to releasable state, I'd like to ask whether it would be OK to make "incremental" releases? That would mean: focus on (maven) modules that seem close to feature-complete and bug-free, fix the
remaining issues and perform a release with that module added.

It seems that the expectations were set to high (content-wise given the amount of human resources), so that neither CM can be released (too many non-fixed issues) nor its "Commons Numbers" spin-off that contains many modules, some of which are blocked by lack of consensus
or dangling discussions.

It probably makes sense, as a design strategy, to separate the function

implementation from the streaming implementation. For example, a 2D
integer
array will probably require a different streaming implementation than
a
1D
double array, but they can probably both be passed the same function
handle to collect, say, the mean or max value.

The role of commons might then be to provide a convenient interface,
so
that the user can simply call a static method like SummaryStats.mean()
and
not have to worry about the implementation.

The other difficulty I see, is that quantile and median statistics
will
not
be as easy to stream as statistics with a closed-form solution like
mean
or
variance. There may however be great algorithms out there for pulling
the
median or the 95% quantile out of a stream -- if so they should be
used.

Eric


Eric,

Would you be the official "mentor" for the GSoC participants that are interested in helping with the porting of "o.a.c.math4.stat"?

Thank you,
Gilles




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to