[ 
https://issues.apache.org/jira/browse/ARROW-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623219#comment-17623219
 ] 

Sven Cattell commented on ARROW-18090:
--------------------------------------

[~lidavidm]  I'm not sure how to create that with the Rust API. Is it possible 
to build that nicely in other APIs?

> Dictionary Style array for Keywords or Tags 
> --------------------------------------------
>
>                 Key: ARROW-18090
>                 URL: https://issues.apache.org/jira/browse/ARROW-18090
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Sven Cattell
>            Priority: Major
>
> I want to efficiently encode lists of tags for each element in my database. 
> In my case I have 30 tags, and a few are assigned to each of my ~20m records. 
> Here's a simplified example of 5 records:
>  * pe, keylogger, cryptojack
>  * pe, packed
>  * pe, cryptojack, c2
>  * pe, keylogger, c2
>  * pe
> Right now I have to store these in a List<Utf8> and have huge amounts of 
> duplicate data. The dictionary array looks almost perfect for this task. I 
> just want to allow for a List<T> instead of just T for the allowed primitive 
> index type in a dictionary.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to