[
https://issues.apache.org/jira/browse/SPARK-38627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511763#comment-17511763
]
Prakhar Sandhu commented on SPARK-38627:
----------------------------------------
Hi [~hyukjin.kwon] , Nice ^^
# Did it work on spark 3.3?
# What environment are you using?
I have set up a conda environment in my local system with spark 3.2.
I specified the numpy explicitly
{code:java}
df = pd.DataFrame({ 'Date1': rng.to_numpy, 'Date2': rng.to_numpy})
File
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pyspark\pandas\frame.py",
line 519, in __init__
pdf = pd.DataFrame(data=data, index=index, columns=columns, dtype=dtype,
copy=copy)
File
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\frame.py", line
435, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
line 254, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
line 64, in arrays_to_mgr
index = extract_index(arrays)
File
"C:\Users\abc\Anaconda3\envs\env2\lib\site-packages\pandas\core\internals\construction.py",
line 355, in extract_index
raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index {code}
> TypeError: Datetime subtraction can only be applied to datetime series
> ----------------------------------------------------------------------
>
> Key: SPARK-38627
> URL: https://issues.apache.org/jira/browse/SPARK-38627
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.1
> Reporter: Prakhar Sandhu
> Priority: Major
>
> I am trying to replace pandas with pyspark.pandas library, when I tried this :
> pdf is a pyspark.pandas dataframe
> {code:java}
> pdf["date_diff"] = (pdf["date1"] - pdf["date2"])/pdf.Timedelta(days=30){code}
> I got the below error :
> {code:java}
> File
> "C:\Users\abc\Anaconda3\envs\test\lib\site-packages\pyspark\pandas\data_type_ops\datetime_ops.py",
> line 75, in sub
> raise TypeError("Datetime subtraction can only be applied to datetime
> series.") {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]