For the Financial World, category time series are very important (i.e. industry/sector categories are different over time). How would this structure look like in this scenario?
On Fri, Aug 19, 2016 at 5:12 PM Jacques Nadeau (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428316#comment-15428316 > ] > > Jacques Nadeau commented on ARROW-81: > ------------------------------------- > > Can you guys provide two small example datasets in JSON format here? > > > C++: Add a Category nested type > > ------------------------------- > > > > Key: ARROW-81 > > URL: https://issues.apache.org/jira/browse/ARROW-81 > > Project: Apache Arrow > > Issue Type: New Feature > > Components: C++ > > Reporter: Wes McKinney > > Assignee: Wes McKinney > > > > A Category (or "factor") is a dictionary-encoded array whose dictionary > has semantic meaning. The data consists of > > - An array of integer "codes" > > - A child array of some other type, known as the "categories" or > "levels" of the array. Typically there is an "ordered" boolean flag > indicating whether the order of the categories is meaningful. > > Category/factor types are used in a number of common statistical > analyses. See, for example, > http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It > is a basic requirement for Python and R, at least, as Arrow C++ consumers, > to have this type. Separately, we should consider what is necessary to be > able to transmit category data in IPCs -- possible an expansion of the > Arrow format. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >