Hi ,

Thanks a lot Gilles for your valuable suggestions and give the reviews so
quickly. I'll apply those corrections asked for any clarifications in here.
By the way since I'm new to Apache Community I'm not yet familiar with some
abbreviations used in the list. [such as ML archive, PMC ]

AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>

Since the design structure is the same, this would be interesting and
easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea at
JIRA !!]

Best Regards,
Gimhana.

On 18 March 2018 at 21:18, Gilles <gil...@harfang.homelinux.org> wrote:

> Hi Gimhana.
>
> On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:
>
>> Hii,
>>
>> I have just shared my draft proposal for GSoC. Port Codes from Commons
>> Math.
>>
>> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
>> OBOqTOeMnPaBsE9U5YhU/edit>
>>
>
> Wow; probably the first time that such a structured document
> appears on this list. ;-)
>
> Devs, would you please review it and I always welcome your precious
>> suggestions to improve it.
>>
>
> OK.  I'll try to provide some clarifications and words of
> caution.
>
> == "Background" section ==
> Useful to cite:
> (for Commons in general)
>  * number of stable/active/dormant components
>  * number of listed/active contributors
>  * overview of topics covered
>  * histogram of component's sizes (lines of code)
> (for Commons Math)
>  * how it fits within the above data
>
> And draw some conclusions out of the comparison.
> You stress "before JDK 1.8"; worth noting that some codes
> dates back to before JDK 1.5!
> Code age is not necessarily a problem per se, but the mix
> (of designs linked to outdated JDK) is, IMHO, a development
> nightmare.
> Modularization can alleviate the unwanted consequences (such
> as release stalled due to the lack of support).
>
> == "Deliverables" section ==
>
> Clarify what is meant by
>  * "less dependencies" (an example?)
>  * "Advanced mathematical functionalities": other than what
>    exists now?  Or do you mean new interfaces (e.g. in
>    accordance with the APIs provided by JDK8)?
>  * "implemented module" (singular). I would assume that
>    "Commons Statistics" will provide many modules.
>  * "Guide for refactoring [..] Commons packages": That is
>    unlikely. ;-)
>    Did you more modestly mean "Commons Math packages"?
>    You should perhaps note (in the "Background" section)
>    that the task has been started two year ago (cf.
>    "Commons RNG" and "Commons Numbers").
>
> Another quite useful task is: set up the web site.
>
> == "Implementation" section ==
>
>  * "Design issues": list *actual* issues (see JIRA).
>    Working with stream would better be described as an
>    enhancement.
>  * Describe "too many dependencies" (examples).
>  * "Design goals": give concrete examples.
>
> The class diagram is nice but I see a big issue with
> the "matrix" functionality. [This was one of the reason
> I wrote a few months ago (cf. ML archive) that the
> refactoring of the "o.a.c.math4.stat" was not among the
> low-hanging fruits of the refactoring.]
> If ever possible, better start with functionality that
> doesn't need the CM matrix code.
>
> == "Results" section ==
>
> Hope to get comment from PMC...
> [Wish list, design requirements, mentor(s), etc.]
>
> == "Future Development" section ==
>
> AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>
>
> Thanks for your interest,
>
> Gilles
>
> Best Regards,
>> Gimhana
>>
>> On 17 March 2018 at 05:06, Gilles <gil...@harfang.homelinux.org> wrote:
>>
>> Hi.
>>>
>>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hi devs,
>>>>
>>>> Sorry for the delayed reply due to my academics.
>>>>
>>>>
>>>> If you want to start playing with the code, we could just begin
>>>>
>>>>> by having discussions here (on design) and on JIRA (for processing
>>>>> minor issues) based on the current state of your repository.
>>>>> [What's the link to look it up?]
>>>>>
>>>>>
>>>>> Should I create my own repo and start code in there?[Not in the forked
>>>> repo]
>>>>
>>>>
>>> What's the difference?  IOW, someone else should answer. :-}
>>>
>>> Actually it will be more helpful to me if someone [ @Gilles or @Eric ]
>>> can
>>>
>>>> guide me more. Like, to give me some minor issues in the current
>>>> implementation to solve or as a new feature implementation and gradually
>>>> we
>>>> can go for deeper
>>>>
>>>>
>>> IMO, the top priority would be to release "Commons Numbers":
>>>   http://commons.apache.org/proper/commons-numbers/
>>>
>>> There are some blocking issues on JIRA:
>>>   https://issues.apache.org/jira/projects/NUMBERS
>>>
>>> and eventually I can go further my my own way.  Then I
>>>
>>>> can gradually familiar with the code and I think it is the most
>>>> efficient
>>>> way to learn the design architecture.[I spent hours to understand the
>>>> current code basis and I felt that was not so efficient as I thought]
>>>>
>>>>
>>> Refactoring the package "stat" is not straightforward...
>>> However, to get to that, it would be useful to record your thoughts
>>> as you browse through the code(s): what seems easy to port, what should
>>> be changed/fixed, what you don't understand, and so on.
>>>
>>>
>>> And if there is a format of Proposal regarding ASF ?
>>>>
>>>>
>>> I don't think so.  This ML is the forum where project directions
>>> are discussed.
>>>
>>> If not what should I
>>>
>>>> mention in the proposal basically?
>>>>
>>>>
>>> This can be a work in progress, I think (see above suggestions).
>>>
>>> Best regards,
>>> Gilles
>>>
>>>
>>>
>>> Best Regards,
>>>>
>>>>
>>>>
>>>>
>>>> On 14 March 2018 at 19:07, Gilles <gil...@harfang.homelinux.org> wrote:
>>>>
>>>> Hi.
>>>>
>>>>>
>>>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>>>
>>>>> Hello Devs,
>>>>>
>>>>>>
>>>>>> Thanks Gilles and Eric for guidance.
>>>>>>
>>>>>> I have cloned the Commons repos and forked the Common's Stat repo. Is
>>>>>> it
>>>>>> possible to make pull requests to that repo to be reviewed?
>>>>>>
>>>>>>
>>>>>> That's certainly possible, but I'm afraid that it will become
>>>>> quite unwieldy from my side if I have to delete/create branches
>>>>> for every PR.
>>>>>
>>>>> If you want to start playing with the code, we could just begin
>>>>> by having discussions here (on design) and on JIRA (for processing
>>>>> minor issues) based on the current state of your repository.
>>>>> [What's the link to look it up?]
>>>>>
>>>>> Or should I
>>>>>
>>>>> follow a specific method?
>>>>>>
>>>>>>
>>>>>> I'll inquire about a more efficient method (than the above)...
>>>>>
>>>>> By referring the API docs I got some idea of the separation of modules.
>>>>>
>>>>>
>>>>>> In the current Commons's stat repo there are some classes under the
>>>>>> package  distribution. I think those can be refactored using java 8 in
>>>>>> build statistics functionalities. Please correct me if I wrong.
>>>>>>
>>>>>>
>>>>>> An example perhaps?
>>>>>
>>>>> As Eric said separation of function and streaming implementations is
>>>>> good
>>>>>
>>>>> idea as designing. (In my point of view, it means method overloading ->
>>>>>> Again correct me if I didn't understand your fact correctly)
>>>>>>
>>>>>>
>>>>>> ?
>>>>>
>>>>> And I will share my draft proposal here for your review soon.
>>>>>
>>>>>
>>>>>>
>>>>>> OK.
>>>>>
>>>>> Thanks again for your interest,
>>>>> Gilles
>>>>>
>>>>>
>>>>>
>>>>> Best Regards.
>>>>>
>>>>>>
>>>>>> On 13 March 2018 at 20:50, Gilles <gil...@harfang.homelinux.org>
>>>>>> wrote:
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>>
>>>>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>>>>
>>>>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <
>>>>>>> gil...@harfang.homelinux.org
>>>>>>> >
>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Where can we find the old code before port into new Commons
>>>>>>>>
>>>>>>>>> components?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The code bases are managed by the "git" software; the whole
>>>>>>>>>> history
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> available:
>>>>>>>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git
>>>>>>>>> ;a=log
>>>>>>>>>
>>>>>>>>> [I'd advise to "clone" the repositories on your local computer, and
>>>>>>>>> use the command line tools.]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe you will want to clone the commons-math repositories, but
>>>>>>>> then
>>>>>>>> develop your own "fork" of the commons-statistics repository. Gilles
>>>>>>>> can
>>>>>>>> correct me if that is wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, I know only my workflow:
>>>>>>>>
>>>>>>>  $ git clone ...
>>>>>>>  $ git branch ...
>>>>>>>  $ git commit ...
>>>>>>>  $ git push
>>>>>>>
>>>>>>> :-}
>>>>>>>
>>>>>>> I didn't find it very easy to cooperate with developers who
>>>>>>> fork on GitHub and submit PRs.
>>>>>>> I've now found the "git" command that creates a branch from
>>>>>>> a PR, but it would be so much more comfortable to just switch
>>>>>>> directory and do "git pull".
>>>>>>>
>>>>>>> In the context of GSoC, would it be possible to grant some
>>>>>>> privilege to non-committers so that they can update a selected
>>>>>>> "git" repository?
>>>>>>> If not, what is the next easiest way to share a "common space"
>>>>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>>>>> bits over to the official source repository?
>>>>>>>
>>>>>>>
>>>>>>> As
>>>>>>>
>>>>>>>
>>>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>>>>>>
>>>>>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>>>>> unless you suspect that something is wrong now and might have
>>>>>>>>> been better before. ;-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In particular, the statistics library was designed before Java 8.
>>>>>>>>> Java
>>>>>>>>>
>>>>>>>>> 8
>>>>>>>> however has provided both efficient programming strategies for these
>>>>>>>> statistical methods (in the form of lambdas and streams) as well as
>>>>>>>> some
>>>>>>>> built-in methods providing summary statistics functions (see
>>>>>>>> discussion
>>>>>>>> at
>>>>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>>>>
>>>>>>>>
>>>>>>>> Very good point, indeed.
>>>>>>>>
>>>>>>> IMO, the new component should be targeted Java 8.
>>>>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we think
>>>>>>> of releasing the code, we still want to avoid "multi-release" JARs it
>>>>>>> will be easy to just remove the "module-info" files (I don't think
>>>>>>> much
>>>>>>> else Java 9 specific would used by "Commons Statistics").
>>>>>>>
>>>>>>> In fact, given the very slow pace at which new components are being
>>>>>>> brought to releasable state, I'd like to ask whether it would be OK
>>>>>>> to make "incremental" releases?  That would mean: focus on (maven)
>>>>>>> modules that seem close to feature-complete and bug-free, fix the
>>>>>>> remaining issues and perform a release with that module added.
>>>>>>>
>>>>>>> It seems that the expectations were set to high (content-wise given
>>>>>>> the amount of human resources), so that neither CM can be released
>>>>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
>>>>>>> contains many modules, some of which are blocked by lack of consensus
>>>>>>> or dangling discussions.
>>>>>>>
>>>>>>> It probably makes sense, as a design strategy, to separate the
>>>>>>> function
>>>>>>>
>>>>>>> implementation from the streaming implementation. For example, a 2D
>>>>>>>
>>>>>>>> integer
>>>>>>>> array will probably require a different streaming implementation
>>>>>>>> than
>>>>>>>> a
>>>>>>>> 1D
>>>>>>>> double array, but they can  probably both be passed the same
>>>>>>>> function
>>>>>>>> handle to collect, say, the mean or max value.
>>>>>>>>
>>>>>>>> The role of commons might then be to provide a convenient interface,
>>>>>>>> so
>>>>>>>> that the user can simply call a static method like
>>>>>>>> SummaryStats.mean()
>>>>>>>> and
>>>>>>>> not have to worry about the implementation.
>>>>>>>>
>>>>>>>> The other difficulty I see, is that quantile and median statistics
>>>>>>>> will
>>>>>>>> not
>>>>>>>> be as easy to stream as statistics with a closed-form solution like
>>>>>>>> mean
>>>>>>>> or
>>>>>>>> variance. There may however be great algorithms out there for
>>>>>>>> pulling
>>>>>>>> the
>>>>>>>> median or the 95% quantile out of a stream -- if so they should be
>>>>>>>> used.
>>>>>>>>
>>>>>>>> Eric
>>>>>>>>
>>>>>>>>
>>>>>>>> Eric,
>>>>>>>>
>>>>>>>
>>>>>>> Would you be the official "mentor" for the GSoC participants that
>>>>>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to