[
https://issues.apache.org/jira/browse/SPARK-37491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472056#comment-17472056
]
pralabhkumar commented on SPARK-37491:
--------------------------------------
Lets take example of
pser = pd.Series([2, 1, np.nan, 4], index=[10, 20, 30, 40], name="Koalas")
pser.asof([5,20]) will give output [Nan , 1]
While
ps.from_pandas(pser).asof[5,20] will give output [Nan, 2]
*Explanation*
Data frame created after applying condition.
F.when(index_scol <= SF.lit(index).cast(index_type) Without applying max
aggregation
+-------------+--------------+-----------------+
|col_5 |col_25 |__index_level_0__|
+-------------+--------------+-----------------+
|null|2.0|10 |
|null|1.0|20 |
|null|null|30 |
|null|null|40 |
+-------------+--------------+-----------------+
Since we are taking max , output is coming 2. Ideally what we need is the last
non null value or each col with increasing order of __index_level_0__.
Now to implement the logic . What I planning to do is create a below DF from
the above DF , using explode , partition and row_number
__index_level_0__. Identifier value row_number
40 col_5 null. 1
30 col_5 null 2
20 col_5 null 3
10 col_5 null 4
40 col_20 2 1
30 col_20 1 2
20 col_20 null 3
10 col_20 null 4
Then filter on row_number=1 . There are other things to take care , but
majority of the logic is this .
Please let me know if its in correct direction ( This is actually passing all
the asof test cases ,including the case which is described in jira. ) .
[~itholic]
> Fix Series.asof when values of the series is not sorted
> -------------------------------------------------------
>
> Key: SPARK-37491
> URL: https://issues.apache.org/jira/browse/SPARK-37491
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: dch nguyen
> Priority: Major
>
> https://github.com/apache/spark/pull/34737#discussion_r758223279
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]