timsaucer commented on issue #16821:
URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3161406074
> I have a problem I'd love to solve but I'm not exactly sure how to go
about it. My issue is I need to do a join across a time axis, where an event in
the past has a corresponding event between two dates in the future, and where
field A is identical between the two events and field B is within some set of
values in the past and within another set of values in the future. I believe if
I partitioned on field A and ordered by date then I could do the self-join
manually with far more efficiency than a more generic self-join.
This sounds very interesting. Can we make it a concrete example? I think I'm
missing part of what the output would look like.
Suppose I had this data frame:
```
+------------+------+-------+---------+
| event | time | price | acct_nr |
+------------+------+-------+---------+
| purchase-1 | 1 | 90.0 | 429 |
| sale-2 | 2 | 135.0 | 184 |
| sale-3 | 3 | 150.0 | 129 |
| purchase-1 | 4 | 100.0 | 584 |
| sale-2 | 5 | 125.0 | 231 |
+------------+------+-------+---------+
```
And I did the self join you're talking about where I'm searching for cases
where `event` is the common Field A you describe but I want cases where the
price goes up from early to late times. This would yield
```
+------------+------------+-------------+---------------+-----------+------------+--------------+
| event | early_time | early_price | early_acct_nr | late_time |
late_price | late_acct_nr |
+------------+------------+-------------+---------------+-----------+------------+--------------+
| purchase-1 | 1 | 90.0 | 429 | 4 | 100.0
| 584 |
+------------+------------+-------------+---------------+-----------+------------+--------------+
```
I added in an extra piece of data because I didn't know what all the self
join would entail - do you want something that ends up sending out only a
subset of the data. If you have a real world use case that is more compelling,
that would be helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]