Hello

I'm a newcomer and not quite sure about the library usage. I tried to find
some documentation about it but failed.

I have a dataset in CSV file where one column(let's call it colour) is a
string category. I'd like to get indices instead of text_lines to pass it
inside algorithm.
I tried to set column_types in ConvertOptions in
{{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(),
arrow::utf8()) }} but it seems to be not right api usage, a wild run-time
error appears: NotImplemented: CSV conversion to dictionary<values=string,
indices=int32, ordered=0> is not supported
Also I find a merged PR #5785 <https://github.com/apache/arrow/pull/5785> but
not quite sure that's applicable for my case.

So, my question is: can I get indices inside a category column only w/
library API. And if yes, what I doing wrong. :)

*In other word,* I'd like to something like such python pandas code:
df[column] = df[column].cat.codes # if str(column_data_type) == "category"

Thank you!

Reply via email to