For the Financial World, category time series are very important (i.e.
industry/sector categories are different over time). How would this
structure look like in this scenario?

On Fri, Aug 19, 2016 at 5:12 PM Jacques Nadeau (JIRA) <j...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428316#comment-15428316
> ]
>
> Jacques Nadeau commented on ARROW-81:
> -------------------------------------
>
> Can you guys provide two small example datasets in JSON format here?
>
> > C++: Add a Category nested type
> > -------------------------------
> >
> >                 Key: ARROW-81
> >                 URL: https://issues.apache.org/jira/browse/ARROW-81
> >             Project: Apache Arrow
> >          Issue Type: New Feature
> >          Components: C++
> >            Reporter: Wes McKinney
> >            Assignee: Wes McKinney
> >
> > A Category (or "factor") is a dictionary-encoded array whose dictionary
> has semantic meaning. The data consists of
> > - An array of integer "codes"
> > - A child array of some other type, known as the "categories" or
> "levels" of the array. Typically there is an "ordered" boolean flag
> indicating whether the order of the categories is meaningful.
> > Category/factor types are used in a number of common statistical
> analyses. See, for example,
> http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It
> is a basic requirement for Python and R, at least, as Arrow C++ consumers,
> to have this type. Separately, we should consider what is necessary to be
> able to transmit category data in IPCs -- possible an expansion of the
> Arrow format.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to