David Saslawsky created ARROW-10450:
---------------------------------------
Summary: Table.fromStruct() silently truncates vectors to the
first chunk
Key: ARROW-10450
URL: https://issues.apache.org/jira/browse/ARROW-10450
Project: Apache Arrow
Issue Type: Bug
Components: JavaScript
Affects Versions: 2.0.0
Reporter: David Saslawsky
Table.fromStruct() only uses the first chunk from the input vector.
{code:javascript}
import { Bool, Field, Int32, Struct, Table, Vector } from "apache-arrow";
const myStruct = new Struct([
Field.new({ name: "over", type: new Int32() }),
Field.new({ name: "out", type: new Bool() })
]);
const data = [];
for(let i=0;i<1500;i++) {
data.push({ over:i, out:i%2 === 0 });
// create a vector with two chunks
const victor = Vector.from({
type: myStruct,
/*highWaterMark: Infinity,*/
values: data
});
console.log(victor.length); // 1500
const table = Table.fromStruct(victor);
console.log(table.length); // 1000
{code}
The workaround is to set highWaterMark to Infinity
Table.new() works as expected
{code:javascript}
const int32Array = new Int32Array(1500);for(let i=0;i<1500;i++) int32Array[i]
= i;
const intVector = Vector.from({ type: new Int32(), values: int32Array});
console.log(intVector.length); // 1500
const intTable = Table.new({ intColumn:intVector });
console.log(intTable.length); // 1500
{code}
The origin seems to be in Chunked.data() but I don't understand the code enough
to propose a fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)