randolf-scholz commented on code in PR #35210:
URL: https://github.com/apache/arrow/pull/35210#discussion_r1170620315


##########
python/pyarrow/tests/test_csv.py:
##########
@@ -952,6 +952,23 @@ def test_decimal_point(self):
             'b': [2.5, -3.0, None],
         }
 
+    def test_dictionary(self):

Review Comment:
   > This is the first composite type alias added to this list. I don't see any 
issue in adding it, but also it leaves the user wondering why only a subset is 
implemented.
   
   I wrote about this in the issue, please correct me if I am wrong, but it 
seems that `dictionary[int32, string]` is the "standard" dictionary type that 
has some useful kernels. At least I noticed horrible performance degradation 
when trying to read CSV data with `int16`, `uint32` or `int64` indexed 
dictionary types. So adding only `dictionary[int32, string]` would be to 
protect the user from using unoptimized composite types. (If this is a bug, I 
can open another issue).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to