[ 
https://issues.apache.org/jira/browse/ARROW-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579040#comment-17579040
 ] 

Todd West commented on ARROW-17391:
-----------------------------------

I was just looking in the C# sources for another thing and noticed ARROW-6870: 
C# Add Support for Dictionary Arrays and Dictionary Encoding resolved fixed on 
2021-08-09. While there's a note in 6870's commit about only supporting 
DictionaryBatch serialization, that does suggest the write from C# bit of this 
issue can reasonably be expected to work.

There is C# test coverage for DictionaryArray in 
[ArrowStreamWriterTests.cs|https://github.com/apache/arrow/blob/master/csharp/test/Apache.Arrow.Tests/ArrowStreamWriterTests.cs]
 and a few of the other classes in the same directory.

> [C#] arrow::read_feather() cannot read DictionaryArray written from C#
> ----------------------------------------------------------------------
>
>                 Key: ARROW-17391
>                 URL: https://issues.apache.org/jira/browse/ARROW-17391
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C#, R
>    Affects Versions: 9.0.1
>            Reporter: Todd West
>            Priority: Major
>             Fix For: 9.0.1
>
>
> This applies to Arrow 9.0.0, both the C# nuget and R package, but for some 
> reason 9.0.0 isn't in the issue dropdowns' list of released versions. It also 
> appears the [implementation status 
> page|https://arrow.apache.org/docs/status.html#ipc-format] may be stale as 
> the C#  source contains 
> [DictionaryArray|https://github.com/apache/arrow/blob/master/csharp/src/Apache.Arrow/Arrays/DictionaryArray.cs]
>  and a look in the debugger confirms the flags flip and the data structures 
> update for 
> [ArrowStreamWriter|https://github.com/apache/arrow/blob/master/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs]
>  having correctly received both the dictionary index and value arrays it's 
> given on the code paths which write a [dictionary 
> batch|https://arrow.apache.org/docs/format/Columnar.html] . However, on the R 
> side, read_feather() fails with
> {{Error: Key error: Dictionary with id 1 not found}}
> So it appears most likely either C# isn't properly emitting the dictionary 
> batch, despite seeming to have all the code to do so, or something's going 
> wrong in the C++ layers under R in the reading side.
> Setup on the C# side is simple
> {{        public static DictionaryArray CreateStringTable(Memory<byte> 
> indicies, IList<string> values)}}
> {{        {}}
> {{            StringArray.Builder valueArray = new();}}
> {{            for (int valueIndex = 0; valueIndex < values.Count; 
> ++valueIndex)}}
> {{            {}}
> {{                valueArray.Append(values[valueIndex]);}}
> {{            }}}{{            UInt8Array indexArray = 
> new(ArrowArrayExtensions.WrapInArrayData(UInt8Type.Default, indicies, 
> indicies.Length));}}
> {{            return new DictionaryArray(new(UInt8Type.Default, 
> StringType.Default, false), indexArray, valueArray.Build());}}
> {{        }}}
> as is the R
> {{        library(arrow)}}
> {{        foo = read_feather("test.feather")}}
> If I drop the dictionary column the two Arrow implementations interop without 
> difficulty. Same if I write only the indices as a UInt8 column. So the issue 
> here is clearly specific to the use of DictionaryColumn. I've also tried 
> other index sizes, so it doesn't appear specific to the use of UInt8.
> I'm therefore left with two questions:
> 1) Does DictionaryArray have working use cases in 9.0.0?
> 2) If what I'm doing's not supposed to work yet, or I'm not getting the data 
> structures set up correctly (there's no C# DictionaryArray example [on 
> github|https://github.com/apache/arrow/tree/master/csharp/examples]), is 
> there an array level workaround?
> There's only one string table in this schema and it's typically tiny (five 
> values or less) so putting its values part in the schema metadata is a viable 
> workaround, albeit an inelegant one.
> Not seeing that there's a feather file viewer available but, if there is, I'd 
> be happy to take a closer look. Can also link the sources after they've been 
> committed and pushed, which should be by the end of the day tomorrow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to