[jira] [Commented] (ARROW-8679) [Python] supporting pandas sparse series in pyarrow

Michael Novitsky (Jira) Mon, 04 May 2020 03:26:23 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098835#comment-17098835
 ]


Michael Novitsky commented on ARROW-8679:
-----------------------------------------

[~jorisvandenbossche] Hi Joris, we are dealing with data that is sparse in its 
nature (contains many nans) and we currently have memory problems when dealing 
with a big Dataframe . We can't use scipy sparse matrices since they support 
compression on zeros only and not nans and we want the data to be sparse in the 
whole flow - dataframe->pyarrow->plasma store. 

Support for conversion to one of the sparse tensors in pyarrow could indeed be 
added - can you please point me to the part where this conversion is happening? 

> [Python] supporting pandas sparse series in pyarrow
> ---------------------------------------------------
>
>                 Key: ARROW-8679
>                 URL: https://issues.apache.org/jira/browse/ARROW-8679
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>         Environment: ubuntu 16/18
>            Reporter: Michael Novitsky
>            Priority: Major
>             Fix For: 0.17.0
>
>
> I've seen that Pandas sparse series was not supported in pyarrow since it was 
> planned to be deprecated.  In Pandas 1.0.1 they released a stable version of 
> sparse array and as far as I know it is not planned to be deprecated anymore. 
> Are you planning to support sparse series in next versions of pyarrow ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8679) [Python] supporting pandas sparse series in pyarrow

Reply via email to