Sorry again, Daniel, fourth query required (I forgot that things may vary
depending on time, this is something I've never done before):
with tmp(NODE, ROUTE_UPDATED) as
(select distinct rh.NODE, rh.ROUTE_UPDATED
from route_history rh
join routes r on rh.NODE = r.NODE), //*This join is only for performance
reasons in case ROUTE_HISTORY contains a lot more records than ROUTES and
may be removed*/
new_routes(NODE) as
(select distinct t.NODE
from tmp t
where not exists(select *
from routes r
left join route_history rh on r.NODE = rh.NODE
and r.ROUTE_INDEX =
rh.ROUTE_INDEX
and r.LINK_NODE = rh.LINK_NODE
and r.QUALITY is not distinct
from rh.QUALITY
and t.ROUTE_UPDATED =
rh.ROUTE_UPDATED
where t.NODE = r.NODE
and rh.NODE is null)
and not exists(select *
from route_history rh
left join routes r on r.NODE = rh.NODE
and r.ROUTE_INDEX =
rh.ROUTE_INDEX
and r.LINK_NODE = rh.LINK_NODE
and r.QUALITY is not distinct
from rh.QUALITY
where t.NODE = rh.NODE
and t.ROUTE_UPDATED = rh.ROUTE_UPDATED
and r.NODE is null))
select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from new_routes nr
join routes r on nr.NODE = r.NODE
Hope I finally got it right,
Set
2017-09-28 10:01 GMT+02:00 Svein Erling Tysvær <[email protected]>:
> Sorry, I forgot that should probably use 'IS NOT DISTINCT' and not '=' for
> QUALITY.
>
> Note that this checks only checks if there has ever been anything
> identical in ROUTE_HISTORY, it doesn't restrain itself to the latest entry
> only (i.e. the query I've written is slightly similar to the second query I
> wrote, it has to be extended if you want something slightly similar to the
> first query). Also, my third query doesn't consider duplicate rows (since
> the fields seem to be the primary key in ROUTE, it isn't theoretically
> possible in your example).
>
> Set
>
> 2017-09-28 9:52 GMT+02:00 Svein Erling Tysvær <[email protected]>:
>
>> OK, that's completely different, but I've done something similar once
>> before. It is not possible to directly compare sets for equality, but it is
>> possible to check that set A doesn't contain anything that isn't in set B
>> and that set B doesn't contain anything that isn't in set A. I assume NODE
>> is the one common denominator:
>>
>> with tmp (NODE) as
>> (select distinct NODE from routes),
>> new_routes(NODE) as
>> (select t.NODE
>> from tmp t
>> where not exists(select *
>> from routes r
>> left join route_history rh on r.NODE = rh.NODE
>> and r.ROUTE_INDEX =
>> rh.ROUTE_INDEX
>> and r.LINK_NODE =
>> rh.LINK_NODE
>> and r.QUALITY = rh.QUALITY
>> where t.NODE = r.NODE
>> and rh.NODE is null)
>> and not exists(select *
>> from route_history rh
>> left join routes r on r.NODE = rh.NODE
>> and r.ROUTE_INDEX =
>> rh.ROUTE_INDEX
>> and r.LINK_NODE =
>> rh.LINK_NODE
>> and r.QUALITY = rh.QUALITY
>> where t.NODE = rh.NODE
>> and r.NODE is null))
>> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
>> from new_routes nr
>> join routes r on nr.NODE = r.NODE
>>
>> Does this get you the result you want?
>> Set
>>
>> 2017-09-28 9:19 GMT+02:00 'Daniel Miller' [email protected]
>> [firebird-support] <[email protected]>:
>>
>>>
>>>
>>> Thanks - but that still doesn't work. I get a huge return set with a
>>> ton of duplicates. I think something's being lost in translation. A
>>> simple match of one row isn't sufficient - I need to match the "set".
>>>
>>> With the following six records:
>>> 5557 1 1116 0
>>> 5557 2 1115 0
>>> 5557 3 5104 2
>>> 5557 4 5073 2
>>> 5557 5 5222 2
>>> 5557 6 5110 2
>>>
>>> That is the list of routes a given node has at a specific point in
>>> time. So it's perfectly possible at another time to have fewer, or more,
>>> or different routes. And I need to consider ALL the routes in play for a
>>> given timestamp to be a single "set". So if today I have the above list of
>>> routes, and yesterday route index 4 was looking at a different node - that
>>> would constitute a different and unique set even though the other 5 records
>>> match.
>>>
>>> 5557 1 1116 0
>>> 5557 2 1115 0
>>> 5557 3 5104 2
>>> 5557 4 9999 2
>>> 5557 5 5222 2
>>> 5557 6 5110 2
>>>
>>> So it's quite possible I will have lots of duplicates for an individual
>>> row in ROUTES when compared in ROUTE_HISTORY - what I'm trying to filter is
>>> the pattern of rows. Only if ALL the rows of a current entries in ROUTES
>>> exist in ROUTE_HISTORY should it be considered a duplicate condition.
>>>
>>> --
>>> Daniel
>>>
>>> On 9/27/2017 11:53:48 PM, "Svein Erling Tysvær [email protected]
>>> [firebird-support]" <[email protected]> wrote:
>>>
>>> Sure it is possible to write such a query:
>>>
>>> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
>>> from routes r
>>> left join route_history rh1 on r.node = rh1.node
>>> and r.route_index = rh1.route_index
>>> and r.link_node = rh1.link_node
>>> and r.quality is not distinct from rh1.quality
>>> left join route_history rh2 on rh1.node = rh2.node
>>> and rh1.route_index = rh2.route_index
>>> and rh1.route_updated < rh2.route_updated
>>> where rh1.node is null
>>> and rh2.node is null
>>>
>>> I use two LEFT JOINS because I assume you want things to be returned if
>>> things are changed back, e.g. if you have
>>>
>>> 5557 1 1116 0
>>> 5557 1 1116 0
>>>
>>> then you only want one row in route_history, but if you have
>>>
>>> 5557 1 1116 0
>>> 5557 1 1116 1
>>> 5557 1 1116 0
>>>
>>> then you don't want the last record to be considered a duplicate, but
>>> know when it changed back. If ROUTE_INDEX is just a running number, you
>>> consider
>>>
>>> 5557 1 1116 0
>>> 5557 2 1116 0
>>>
>>> to be duplicates, and there are no gaps between the ROUTE_INDEX for each
>>> NODE, then the query can be simplified:
>>>
>>> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
>>> from routes r
>>> left join route_history rh1 on r.node = rh1.node
>>> and r.route_index+1 = rh1.route_index
>>> and r.link_node = rh1.link_node
>>> and r.quality is not distinct from rh1.quality
>>> where rh1.node is null
>>>
>>> Note that I use IS NOT DISTINCT so that NULLs are considered equal. If
>>> you prefer, the left joins can be replaced by (nested) NOT EXISTS.
>>>
>>> HTH,
>>> Set
>>>
>>> 2017-09-28 7:22 GMT+02:00 'Daniel Miller' [email protected]
>>> [firebird-support] <[email protected]>:
>>>
>>>>
>>>>
>>>> On 9/27/2017 9:20:54 PM, "liviuslivius [email protected]
>>>> [firebird-support]" <[email protected]> wrote:
>>>>
>>>>
>>>> group by + having
>>>> is your friend.
>>>> E.g.
>>>> Select field1, field2, count(*)
>>>> From tablex
>>>> Group by field1, field2
>>>> Having count(*)>1
>>>>
>>>> But in your scenario i do not know if this is the solution. Question is
>>>> if record was changed and in next update "restored" and once again changed
>>>> is this duplicate or no?
>>>> If yes above query is ok if no then procedure is your friend.
>>>>
>>>> And for the future change your audit trigger and check before if there
>>>> was any change
>>>>
>>>> Thank you, but I this isn't quite what I need. I need to match against
>>>> all the records of a set (up to 8 rows per set) - not just individual rows.
>>>>
>>>> --
>>>> Daniel
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>