[
https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677198#comment-17677198
]
Pralabh Kumar commented on SPARK-36728:
---------------------------------------
[~gurwls223] I think this can be closed , as its fixed part of
# SPARK-36742
> Can't create datetime object from anything other then year column Pyspark -
> koalas
> ----------------------------------------------------------------------------------
>
> Key: SPARK-36728
> URL: https://issues.apache.org/jira/browse/SPARK-36728
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: Bjørn Jørgensen
> Priority: Major
> Attachments: pyspark_date.txt, pyspark_date2.txt
>
>
> If I create a datetime object it must be from columns named year.
>
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015,
> 2016], 'month': [2, 3], 'day': [4, 5],
> 'hour': [2, 3], 'minute': [10, 30],
> 'second': [21,25]}) df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data
> columns (total 6 columns): # Column Non-Null Count Dtype--- ------
> -------------- ----- 0 year 2 non-null int64 1 month 2
> non-null int64 2 day 2 non-null int64 3 hour 2 non-null
> int64 4 minute 2 non-null int64 5 second 2 non-null
> int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data
> columns (total 7 columns): # Column Non-Null Count Dtype --- ------
> -------------- ----- 0 year 2 non-null int64 1 month
> 2 non-null int64 2 day 2 non-null int64 3 hour
> 2 non-null int64 4 minute 2 non-null int64 5 second
> 2 non-null int64 6 date 2 non-null datetime64dtypes:
> datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],
> 'testmonth': [2, 3], 'testday': [4, 5],
> 'hour': [2, 3], 'minute': [10, 30],
> 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear',
> 'testmonth', 'testday']])
> ---------------------------------------------------------------------------KeyError
> Traceback (most recent call
> last)/tmp/ipykernel_73/904491906.py in <module>----> 1 df_test['date'] =
> ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853
> return self.loc[:, key] 11854 elif is_list_like(key):>
> 11855 return self.loc[:, list(key)] 11856 raise
> NotImplementedError(key) 11857
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476
> returns_series, 477 series_name,--> 478
> ) = self._select_cols(cols_sel) 479 480 if cond
> is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel,
> missing_keys) 322 return self._select_cols_else(cols_sel,
> missing_keys) 323 elif is_list_like(cols_sel):--> 324
> return self._select_cols_by_iterable(cols_sel, missing_keys) 325
> else: 326 return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in
> _select_cols_by_iterable(self, cols_sel, missing_keys) 1352
> if not found: 1353 if missing_keys is None:-> 1354
> raise KeyError("['{}'] not in
> index".format(name_like_string(key))) 1355 else: 1356
> missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3
> 30 25
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]