[
https://issues.apache.org/jira/browse/ARROW-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-5274:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/21744
> [JavaScript] Wrong array type for countBy
> -----------------------------------------
>
> Key: ARROW-5274
> URL: https://issues.apache.org/jira/browse/ARROW-5274
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Reporter: Yngve Kristiansen
> Assignee: Yngve Kristiansen
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Original Estimate: 5m
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The {{countBy}} function is not returning correct histograms, as it seems to
> select the wrong array type for the indexing.
> The following line in countBy seems to be causing the problems:
> {{const countByteLength = Math.ceil(Math.log(vector.dictionary.length) /
> Math.log(256));}}
> For example, if the dictionary length is 3, yet the indices length is 1
> million, the result of this expression will be 1, which will lead to a
> Uint8Array being used, again resulting in overflows.
> Codepen example
> [https://codepen.io/Yngve92/pen/mYdWrr]
> If I switch the expression to: {{const countByteLength =
> Math.ceil(Math.log(vector.length) / Math.log(256));}} it seems to be working
> all right, but I am not sure if this is correct.
> The expression is on L63, L189 in src/compute/dataframe.ts.
>
> PR submitted: [https://github.com/apache/arrow/pull/4265]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)