Many Thanks Phil, for answering all my questions.

On Tue, Oct 14, 2014 at 10:19 PM, Phil Steitz <phil.ste...@gmail.com> wrote:

> On 10/14/14 6:59 AM, venkatesha murthy wrote:
> > ok.
> >
> > Wanted to understand advantage of having a container class for all
> > storeless stats (just as DescriptiveStats is for Univariate). I could
> open
> > another email thread.
>
> SummaryStatistics is a container for storeless stats,
> DescriptiveStatistics is for stats computed over a stored dataset,
> possibly with a rolling window.  The rationale here is that
> SummaryStatistics aggregates StorelessUnivariateUnivariateStatistics
> while DescriptiveStatistics aggregates statistics that implement
> only UnivariateStatistic, which requires that the full set of data
> be provided as an input array (so the aggregate has to maintain a
> dataset in memory).  The advantage of having a container for
> storeless stats is that a stream of data can be fed into the
> container's addValue method and the constituent stats will all get
> updated with the values as they come in.
>
> > Also wanted to understand whats a abstract interface problem that you
> were
> > refering
>
> We moved to favoring abstract classes (where needed / useful) over
> interfaces because it is easier to add to / modify abstract classes
> than interfaces in a backward compatible way.
> >
> > thanks
> > murthy
> >
> > On Tue, Oct 14, 2014 at 9:47 AM, Phil Steitz <phil.ste...@gmail.com>
> wrote:
> >
> >> On 10/13/14 8:55 PM, venkatesha murthy wrote:
> >>> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <phil.ste...@gmail.com>
> >> wrote:
> >>>> On 10/13/14 1:04 PM, venkatesha murthy wrote:
> >>>>> Adding a bit more on this:
> >>>>> a) The DescriptiveStatisticalSummary actually handles the rest of the
> >>>>> functions such as addValue, getPercentile etc.
> >>>>> b) I have added addValue() as it is important to see either storeless
> >> or
> >>>>> store variants as interfaces.
> >>>>> c) A case in point being (for b); i was actually trying out a
> lockfull
> >>>> and
> >>>>> a lockfree based variants for descriptive statistical summary and it
> >> was
> >>>>> very concise/consistent with an interface to use that has all common
> >>>>> functions across all variants.
> >>>>> d) well lock based or lock free variants are not a part of this patch
> >> as
> >>>>> iam still working through
> >>>>>
> >>>>> However i feel the getPercentile can definitely add value. Please let
> >> me
> >>>>> know if i could turn in all the relevant methods of
> >>>>> DescriptiveStorelessStatistics  into statistical summary (such as
> >>>> kurtosis,
> >>>>> skewness etc..) and then we could just use SummaryStatistics.
> >>>> I am not sure I understand what you are proposing.  Currently, we
> >>>> have two statistical "aggregates" for descriptive univariate stats:
> >>>> SummaryStatistics - aggregates "storeless" statistics over a stream
> >>>> of data that is not stored in memory
> >>>> DescriptiveStatistics - provides an extended set of statistics, some
> >>>> of which require that the full set of data be stored in memory
> >>>>
> >>>> OK. I am sorry for the confusion here. I understand the intent now.
> >>> However what i wanted to convey was all the statistics that
> >>> is supported in current DescriptiveStatistics can be supported in
> >> Storeless
> >>> variant as well. (For eg: skewness, kurtosis, percentile)
> >> No, for example exact percentiles, or even arbitrary percentiles
> >> (without the quantile - e.g. quartile) specified in advance, can't
> >> be computed without storing the data.  Also, DescriptiveStatistics
> >> supports a rolling window and stats it implements can make use of
> >> multi-pass algorithms.
> >>
> >>> Therefore; what i was proposing is to have a common interface that can
> >> have
> >>> all these methods too. for eg: (we can change the name if it is needed)
> >>>
> >>> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
> >>> StatisticalSummary{
> >>>      getKurtosis();
> >>>      getPercentile();
> >>>      getSkewness();
> >>>      // Add Mutation methods as well
> >>>      addValue(double d);
> >>>      //Provide additional builder methods for injecting custom
> >> percentile,
> >>> kurtosis, skewness, variance etc.
> >>>      withPercentile(S Percentile);
> >>>      withKurtosis(S kurtosis);
> >>> }
> >> Per comments above, the contracts of these aggregates are
> >> different.  We have also moved away from defining abstract
> >> interfaces as these end up creating problems when we want to add
> >> things (as in the subject of this thread).
> >>
> >> Phil
> >>>> The subject of this thread was a proposal to add quartiles to
> >>>> SummaryStatistics, as the new(ish) PSquarePercentile allows those
> >>>> statistics to be computed without storing the data.
> >>>>
> >>>> Agreed. I was just adding points on how we can bring both
> >>> DescriptiveStatistics and SummaryStatistics under a common interface
> for
> >>> all the stats.
> >>>
> >>>> Phil
> >>>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
> >>>>> venkateshamurth...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Phil,
> >>>>>>
> >>>>>> Though i did not add to StatisticalSummary i was actually working
> on a
> >>>>>> DescriptiveStatisticalSummary for all the Storeless variants
> inclusive
> >>>> of
> >>>>>> PSquarePercentile. Would it help if you can actually implement
> >>>>>> SummaryStatisitcs with an extended interface such as
> >>>>>> DescriptiveStatisticalSummary ? below.
> >>>>>>
> >>>>>> That said i actually wanted to discuss the new storelessvariant of
> >>>>>> descriptive statistics.
> >>>>>> a) DescriptiveStatisticalSummary - an extended interface for
> >>>>>> StatisticalSummary (adds a Generic type that can cater for store
> full
> >>>> and
> >>>>>> storeless)
> >>>>>> b) DescriptiveStorelessStatistics - Storeless variant of
> >>>>>> DescriptiveStatisitcs
> >>>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized
> >> wrapper.
> >>>>>> Test case classes added to the same.
> >>>>>>
> >>>>>> Please let me know on this i could also accomodate the changes to
> >>>> summary
> >>>>>> stats based on this change here.
> >>>>>> Also please let me know if this could be raised as a jira ticket to
> >>>> pursue.
> >>>>>> Thanks
> >>>>>> Murthy
> >>>>>>
> >>>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <phil.ste...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Now that we have a "storeless" percentile estimator, we can add
> >>>>>>> quartile computation to SummaryStatistics.  Any objections to my
> >>>>>>> adding this?  I could optionally add a boolean constructor argument
> >>>>>>> to avoid the overhead of maintaining these stats.  Or more
> >>>>>>> generally, add a bitfield encoding the exact set of stats the user
> >>>>>>> wants to maintain.  If there are no objections to the addition, I
> >>>>>>> will open a JIRA.
> >>>>>>>
> >>>>>>> Phil
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
> >>>>>>>
> >>>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >>>> For additional commands, e-mail: dev-h...@commons.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >> For additional commands, e-mail: dev-h...@commons.apache.org
> >>
> >>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to