[jira] [Created] (SPARK-36707) Support to specify index type and name in pandas API on Spark

Hyukjin Kwon (Jira) Thu, 09 Sep 2021 23:35:05 -0700

Hyukjin Kwon created SPARK-36707:
------------------------------------

             Summary: Support to specify index type and name in pandas API on 
Spark
                 Key: SPARK-36707
                 URL: https://issues.apache.org/jira/browse/SPARK-36707
             Project: Spark
          Issue Type: Umbrella
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: Hyukjin Kwon



See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.

pandas API on Spark currently there's no way to specify the index type and name 
in the output when you apply an arbitrary function, which forces to create the 
default index:

{code}
>>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
...     pdf['A'] = pdf.id + 1
...     return pdf
...
>>> ps.range(5).koalas.apply_batch(transform)
{code}

{code}
   id   A
0   0   1
1   1   2
2   2   3
3   3   4
4   4   5
{code}

We should have a way to specify the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-36707) Support to specify index type and name in pandas API on Spark

Reply via email to