[
https://issues.apache.org/jira/browse/ARROW-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327337#comment-16327337
]
ASF GitHub Bot commented on ARROW-1990:
---------------------------------------
TheNeuralBit opened a new pull request #1482: ARROW-1990: [JS] Add "DataFrame"
object
URL: https://github.com/apache/arrow/pull/1482
This PR moves the `Table` class out of the Vector hierarchy and adds
optimized dataframe operations to it. Currently implements an optimized
`scan()` method, `filter(predicate)`, `count()`, and `countBy(column_name)`
(only works on dictionary-encoded columns).
Some usage examples, based on the file generated by
`js/data/test/tables/generate.py`:
``` js
> let table = Table.from(...);
> let table =
Table.from([fs.readFileSync('./test/data/tables/generated.arrow')])
undefined
> table.count()
1000000
> table.filter(col('lat').gteq(0)).count()
499718
> table.countBy('origin').asJSON()
{ Charlottesville: 166839,
'New York': 166251,
'San Francisco': 166642,
Seattle: 166659,
'Terre Haute': 166756,
'Washington, DC': 166853 }
> table.filter(col('lng').gteq(0)).countBy('origin').asJSON()
{ Charlottesville: 83109,
'New York': 83221,
'San Francisco': 83515,
Seattle: 83362,
'Terre Haute': 83314,
'Washington, DC': 83479 }
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [JS] Add "DataFrame" object
> ---------------------------
>
> Key: ARROW-1990
> URL: https://issues.apache.org/jira/browse/ARROW-1990
> Project: Apache Arrow
> Issue Type: New Feature
> Components: JavaScript
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: Major
> Labels: pull-request-available
>
> Add a TypeScript class that can perform optimized dataframe operations on an
> arrow {{Table}} and/or {{StructVector}}. Initially this should include
> operations like filtering, counting, and scanning. Eventually this class
> could include more operations like sorting, count by/group by, etc...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)