lmshk opened a new issue, #501:
URL: https://github.com/apache/arrow-julia/issues/501
On Arrow v2.7.1 (and Julia v1.10.1):
```julia
julia> t = (; a = Arrow.DictEncode([:a]))
(a = [:a],)
julia> Arrow.write("x.stream.arrow", t, file = false)
"x.stream.arrow"
julia> Arrow.append("x.stream.arrow", t)
ERROR: ArgumentError: Table schema does not match existing arrow file schema
Stacktrace: [...]
```
The problem is that the `NamedTuple` has
```julia
julia> Tables.schema(t).types
(Arrow.DictEncodeType{Symbol},)
```
while the stream is identified as
```julia
julia> s = open("x.stream.arrow", "r+") do io
Arrow.stream_properties(io)
end;
julia> s[2].types
(Symbol,)
```
and there doesn't seem to be an easy workaround because `append` doesn't
allow overriding the `arrow_schema` without effectively duplicating the other
append methods' code on the user side. Omitting the `Arrow.DictEncode` on
subsequent segments doesn't work either:
```julia
julia> t2 = (; a = [:b])
(a = [:b],)
julia> Arrow.append("x.stream.arrow", t2)
"x.stream.arrow"
julia> d = Arrow.Table("x.stream.arrow")
Arrow.Table with 9 rows, 1 columns, and schema:
:a Symbol
julia> d.a
9-element SentinelArrays.ChainedVector{Symbol, Arrow.DictEncoded{Symbol,
Int8, Arrow.List{Symbol, Int32, Vector{UInt8}}}}:
Error showing value of type SentinelArrays.ChainedVector{Symbol,
Arrow.DictEncoded{Symbol, Int8, Arrow.List{Symbol, Int32, Vector{UInt8}}}}:
ERROR: ArgumentError: Symbol name may not contain \0
```
I am unsure whether changing
[`is_equivalent_schema`](https://github.com/apache/arrow-julia/blob/ac199b0e377502ea0f1fa5ced7fda897a01b82a9/src/append.jl#L280)
would fix the issue because I don't understand if the downstream code
([`toarrowtable`](https://github.com/apache/arrow-julia/blob/ac199b0e377502ea0f1fa5ced7fda897a01b82a9/src/write.jl#L508)?)
can handle unequal schemas like this.
Please advise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]