Le 03/12/2019 à 04:55, Wes McKinney a écrit :
> An option was recently added to dictionary encode all string columns
> I think it would be useful to be able to hard-opt-in to
> dictionary-encode a particular column (regardless of the what
> cardinality ends up being). Whatever the way to do this, it should be
> clear and well documented. A new JIRA issue may be in order. Antoine,
> what do you think?
On Sun, Dec 1, 2019 at 5:32 PM ntfs hard <> wrote:
>> Hello
>> I'm a newcomer and not quite sure about the library usage. I tried to find
>> some documentation about it but failed.
>> I have a dataset in CSV file where one column(let's call it colour) is a
>> string category. I'd like to get indices instead of text_lines to pass it
>> inside algorithm.
>> I tried to set column_types in ConvertOptions in
>> {{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(),
>> arrow::utf8()) }} but it seems to be not right api usage, a wild run-time
>> error appears: NotImplemented: CSV conversion to dictionary<values=string,
>> indices=int32, ordered=0> is not supported
>> Also I find a merged PR #5785 <> but
>> not quite sure that's applicable for my case.
>> So, my question is: can I get indices inside a category column only w/
>> library API. And if yes, what I doing wrong. :)
>> *In other word,* I'd like to something like such python pandas code:
>> df[column] = df[column] # if str(column_data_type) == "category"
>> Thank you!

