randolf-scholz commented on code in PR #35210:
URL: https://github.com/apache/arrow/pull/35210#discussion_r1170620315
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -952,6 +952,23 @@ def test_decimal_point(self):
'b': [2.5, -3.0, None],
}
+ def test_dictionary(self):
Review Comment:
> This is the first composite type alias added to this list. I don't see any
issue in adding it, but also it leaves the user wondering why only a subset is
implemented.
I wrote about this in the issue, please correct me if I am wrong, but it
seems that `dictionary[int32, string]` is the "standard" dictionary type that
has some useful kernels. At least I noticed horrible performance degradation
when trying to read CSV data with `int16`, `uint32` or `int64` indexed
dictionary types. So adding only `dictionary[int32, string]` would be to
protect the user from using unoptimized composite types. (If this is a bug, I
can open another issue).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]