RalfJL commented on issue #12949:
URL: https://github.com/apache/pinot/issues/12949#issuecomment-2082309994
Thanks for the query.
I can confirm that it returns the same amount of records as the query
```
select a.acctuniquesessionid, a.acctstatustype, a.ts, b.acctstatustype,
b.ts, (cast(b.ts as BIGINT)-cast(a.ts as BIGINT))/1000 as Zeit from
radius_start a
join radius_stop b
on a.acctuniquesessionid = b.acctuniquesessionid
order by Zeit desc
```
But it is not the same as the second query with the "not exists" clause.
Please correct me if I am wrong.
And by the way, my "not exists" clause is wrong, see explanation below.
Never the less it is astonishing, that adding a filter produces more output
than the same query without the filter.
The overall goal is to find matching Start-Stop Records though they have the
same "acctuniquesessionid". The key is the timestamp column "ts".
Think about the following example:
acctuniquesssionid ts-Start
ts-Stop
099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:05:05.0 **2024-04-28
13:15:08.0**
099d8dd5581b5e511f895c7c736913d4 **2024-04-28 13:15:09.0** 2024-04-28
14:24:48.0
Simply joining both tables will return 4 records.
ts acctstatustype ts acctstatustype Zeit
2024-04-28 13:15:09.0 Start 2024-04-28 13:15:08.0 Stop -1
2024-04-28 13:15:09.0 Start 2024-04-28 14:24:48.0 Stop 4179
2024-04-28 13:05:05.0 Start 2024-04-28 13:15:08.0 Stop 603
2024-04-28 13:05:05.0 Start 2024-04-28 14:24:48.0 Stop 4783
And obviously the first and the last records are wrong
Next step is to filter out the record, where Stop is before Start (kick out
the 1. Record)
```
select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT)
- cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a
inner join radius_stop b
on a.acctuniquesessionid = b.acctuniquesessionid
where a.ts < b.ts and a.acctuniquesessionid =
'099d8dd5581b5e511f895c7c736913d4'
```
And the next step is to kick out the pair where Start from the first session
is combined with the Stop of the second session.
In other words there can not be any other Start record between our
Start-Stop pair
```
select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT)
- cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a
inner join radius_stop b
on a.acctuniquesessionid = b.acctuniquesessionid and a.ts < b.ts
where a.acctuniquesessionid = '099d8dd5581b5e511f895c7c736913d4'
and not exists ( select 1 from radius_start c where c.acctuniquesessionid =
a.acctuniquesessionid and cast(c.ts as BIGINT) between cast(a.ts as BIGINT) +1
and cast(b.ts as BIGINT) -1)
```
Which returns the desired pairs:
acctuniquesessionid ts acctstatustype ts acctstatustype Zeit
099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:15:09.0 Start
2024-04-28 14:24:48.0 Stop 4179
099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:05:05.0 Start
2024-04-28 13:15:08.0 Stop 603
Removing the filter for acctuniquessionid still shows the right records but
the number of records returned differ by the factor 2.5 depending on the where
clause
Query with "not exists" returns 467702 records
Query without any where clause returns 185501 records
actually it should be the other way round, right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]