[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

Leif Walsh (JIRA) Mon, 24 Jul 2017 07:53:17 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098522#comment-16098522
 ]


Leif Walsh commented on SPARK-21187:
------------------------------------

[~rxin] [~bryanc], pandas does support array and map columns, it represents 
each value as a python {{list}} or {{dict}} (with {{object}} dtype):

{code}
>>> pd.DataFrame({'x': [[1,2,3], [4,5]], 'y': [{'hello': 1}, {'world': 2, 
>>> ('fizz', 'buzz'): 3}]})
           x                                  y
0  [1, 2, 3]                       {'hello': 1}
1     [4, 5]  {'world': 2, ('fizz', 'buzz'): 3}
{code}

You could also model structs as namedtuples:

{code}
>>> import collections
>>> person = collections.namedtuple('person', ['first', 'last'])
>>> pd.DataFrame({'participants': [person('Reynold', 'Xin'), person('Bryan', 
>>> 'Cutler')]})
      participants
0   (Reynold, Xin)
1  (Bryan, Cutler)
{code}

This would also have {{object}} dtype.

Another choice is, for structs at least, you could model it as a hierarchical 
index on columns:

{code}
>>> pd.DataFrame(data=[['Reynold', 'Xin'], ['Bryan', 'Cutler']], 
>>> columns=pd.MultiIndex(levels=[['participant'], ['first', 'last']], 
>>> labels=[[0, 0], [0, 1]]))
  participant        
        first    last
0     Reynold     Xin
1       Bryan  Cutler
{code}

Let me know if this is unclear and I should elaborate.

> Complete support for remaining Spark data types in Arrow Converters
> -------------------------------------------------------------------
>
>                 Key: SPARK-21187
>                 URL: https://issues.apache.org/jira/browse/SPARK-21187
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark, SQL
>    Affects Versions: 2.3.0
>            Reporter: Bryan Cutler
>
> This is to track adding the remaining type support in Arrow Converters.  
> Currently, only primitive data types are supported.  '
> Remaining types:
> * *Date*
> * *Timestamp*
> * *Complex*: Struct, Array, Map
> * *Decimal*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

Reply via email to