Hyukjin Kwon created SPARK-36707:
------------------------------------
Summary: Support to specify index type and name in pandas API on
Spark
Key: SPARK-36707
URL: https://issues.apache.org/jira/browse/SPARK-36707
Project: Spark
Issue Type: Umbrella
Components: PySpark
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon
See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.
pandas API on Spark currently there's no way to specify the index type and name
in the output when you apply an arbitrary function, which forces to create the
default index:
{code}
>>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
... pdf['A'] = pdf.id + 1
... return pdf
...
>>> ps.range(5).koalas.apply_batch(transform)
{code}
{code}
id A
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
{code}
We should have a way to specify the index.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]