[
https://issues.apache.org/jira/browse/ARROW-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
geert-jan brits updated ARROW-15086:
------------------------------------
Description:
After reading an arrow table from disk and concatenating another arrow table,
this concatenated table loses it's last concatenated chunk after serializing to
disk.
Steps to reproduce below:
{code:java}
// helper that creates a vector to make the below more succinct
function createSomeVectors(){ return [
FloatVector.from(Float32Array.from([1]))] }
// CORRECT
// Create tables T1 + T2. Each contain 1 vector with 1 value
const T1 = Table.new(createSomeVectors(), ['number'])
const T2 = Table.new(createSomeVectors(), ['number'])
// Combine these tables
const combined = T1.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization = Table.from([combined.serialize()])
// Print the count (works correctly)
console.log(T1.count(), T2.count(), combined.count(),
combinedAfterSerialization.count())
// Result (as expected)= 2, 2, 2, 2
// INCORRECT
// Serialize T1 and read back. (mimic reading from disk)
const T1SerializedAndBack = Table.from([T1.serialize()])
// Combine just read T1SerializedAndBack with T2
const combined2 = T1SerializedAndBack.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization2 = Table.from([combined2.serialize()])
// Print the count (works Incorrectly)
console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(),
combinedAfterSerialization2.count())
// Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
was:
After reading an arrow table from disk and concatenating another arrow table,
this concatenated table loses it's last concatenated chunk after serializing to
disk.
Steps to reproduce below:
{code:java}
// helper that creates a vector to make the below more succinct
function createSomeVectors(){ return [
FloatVector.from(Float32Array.from([1]))] }
// CORRECT
// Create tables T1 + T2. Each contain 1 vector with 1 value
const T1 = Table.new(createSomeVectors(), ['number'])
const T2 = Table.new(createSomeVectors(), ['number'])
// Combine these tables
const combined = T1.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization = Table.from([combined.serialize()])
// Print the count (works correctly)
console.log(T1.count(), T2.count(), combined.count(),
combinedAfterSerialization.count())
// Result (as expected)= 2, 2, 2, 2
// INCORRECT
// Serialize T1 and read back. (mimic reading from disk)
const T1SerializedAndBack = Table.from([T1.serialize()])
// Combine just read T1SerializedAndBack with T2
const combined2 = T1SerializedAndBack.concat(T2)
// Serialize and read back this combination (mimic reading from disk)
const combinedAfterSerialization2 = Table.from([combined2.serialize()])
// Print the count (works Incorrectly)
console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(),
combinedAfterSerialization2.count())
// Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
> [JS] Incorrect Table concat after serialize
> -------------------------------------------
>
> Key: ARROW-15086
> URL: https://issues.apache.org/jira/browse/ARROW-15086
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Affects Versions: 6.0.1
> Reporter: geert-jan brits
> Priority: Major
>
> After reading an arrow table from disk and concatenating another arrow table,
> this concatenated table loses it's last concatenated chunk after serializing
> to disk.
>
> Steps to reproduce below:
> {code:java}
> // helper that creates a vector to make the below more succinct
> function createSomeVectors(){ return [
> FloatVector.from(Float32Array.from([1]))] }
> // CORRECT
> // Create tables T1 + T2. Each contain 1 vector with 1 value
> const T1 = Table.new(createSomeVectors(), ['number'])
> const T2 = Table.new(createSomeVectors(), ['number'])
> // Combine these tables
> const combined = T1.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)
> const combinedAfterSerialization = Table.from([combined.serialize()])
> // Print the count (works correctly)
> console.log(T1.count(), T2.count(), combined.count(),
> combinedAfterSerialization.count())
> // Result (as expected)= 2, 2, 2, 2
> // INCORRECT
> // Serialize T1 and read back. (mimic reading from disk)
> const T1SerializedAndBack = Table.from([T1.serialize()])
> // Combine just read T1SerializedAndBack with T2
> const combined2 = T1SerializedAndBack.concat(T2)
> // Serialize and read back this combination (mimic reading from disk)
> const combinedAfterSerialization2 = Table.from([combined2.serialize()])
> // Print the count (works Incorrectly)
> console.log(T1SerializedAndBack.count(), T2.count(), combined2.count(),
> combinedAfterSerialization2.count())
> // Result (NOT as expected)= 2, 2, 2, 1 <=!! {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)