Re: Re[4]: [firebird-support] Find duplicate multi-row entries

Svein Erling Tysvær [email protected] [firebird-support] Thu, 28 Sep 2017 01:02:24 -0700

Sorry, I forgot that should probably use 'IS NOT DISTINCT' and not '=' for
QUALITY.


Note that this checks only checks if there has ever been anything identical
in ROUTE_HISTORY, it doesn't restrain itself to the latest entry only (i.e.
the query I've written is slightly similar to the second query I wrote, it
has to be extended if you want something slightly similar to the first
query). Also, my third query doesn't consider duplicate rows (since the
fields seem to be the primary key in ROUTE, it isn't theoretically possible
in your example).

Set

2017-09-28 9:52 GMT+02:00 Svein Erling Tysvær <[email protected]>:

> OK, that's completely different, but I've done something similar once
> before. It is not possible to directly compare sets for equality, but it is
> possible to check that set A doesn't contain anything that isn't in set B
> and that set B doesn't contain anything that isn't in set A. I assume NODE
> is the one common denominator:
>
> with tmp (NODE) as
> (select distinct NODE from routes),
> new_routes(NODE) as
> (select t.NODE
>  from tmp t
>  where not exists(select *
>                   from routes r
>                   left join route_history rh on r.NODE = rh.NODE
>                                             and r.ROUTE_INDEX =
> rh.ROUTE_INDEX
>                                             and r.LINK_NODE = rh.LINK_NODE
>                                             and r.QUALITY = rh.QUALITY
>                   where t.NODE = r.NODE
>                     and rh.NODE is null)
>   and not exists(select *
>                   from route_history rh
>                   left join routes r on r.NODE = rh.NODE
>                                             and r.ROUTE_INDEX =
> rh.ROUTE_INDEX
>                                             and r.LINK_NODE = rh.LINK_NODE
>                                             and r.QUALITY = rh.QUALITY
>                   where t.NODE = rh.NODE
>                     and r.NODE is null))
> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
> from new_routes nr
> join routes r on nr.NODE = r.NODE
>
> Does this get you the result you want?
> Set
>
> 2017-09-28 9:19 GMT+02:00 'Daniel Miller' [email protected]
> [firebird-support] <[email protected]>:
>
>>
>>
>> Thanks - but that still doesn't work.  I get a huge return set with a ton
>> of duplicates.  I think something's being lost in translation.  A simple
>> match of one row isn't sufficient - I need to match the "set".
>>
>> With the following six records:
>> 5557    1    1116    0
>> 5557    2    1115    0
>> 5557    3    5104    2
>> 5557    4    5073    2
>> 5557    5    5222    2
>> 5557    6    5110    2
>>
>> That is the list of routes a given node has at a specific point in time.
>> So it's perfectly possible at another time to have fewer, or more, or
>> different routes.  And I need to consider ALL the routes in play for a
>> given timestamp to be a single "set".  So if today I have the above list of
>> routes, and yesterday route index 4 was looking at a different node - that
>> would constitute a different and unique set even though the other 5 records
>> match.
>>
>> 5557    1    1116    0
>> 5557    2    1115    0
>> 5557    3    5104    2
>> 5557    4    9999    2
>> 5557    5    5222    2
>> 5557    6    5110    2
>>
>> So it's quite possible I will have lots of duplicates for an individual
>> row in ROUTES when compared in ROUTE_HISTORY - what I'm trying to filter is
>> the pattern of rows.  Only if ALL the rows of a current entries in ROUTES
>> exist in ROUTE_HISTORY should it be considered a duplicate condition.
>>
>> --
>> Daniel
>>
>> On 9/27/2017 11:53:48 PM, "Svein Erling Tysvær [email protected]
>> [firebird-support]" <[email protected]> wrote:
>>
>> Sure it is possible to write such a query:
>>
>> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
>> from routes r
>> left join route_history rh1 on r.node = rh1.node
>>                           and r.route_index = rh1.route_index
>>                           and r.link_node = rh1.link_node
>>                           and r.quality is not distinct from rh1.quality
>> left join route_history rh2 on rh1.node = rh2.node
>>                           and rh1.route_index = rh2.route_index
>>                           and rh1.route_updated < rh2.route_updated
>> where rh1.node is null
>>   and rh2.node is null
>>
>> I use two LEFT JOINS because I assume you want things to be returned if
>> things are changed back, e.g. if you have
>>
>> 5557    1    1116    0
>> 5557    1    1116    0
>>
>> then you only want one row in route_history, but if you have
>>
>> 5557    1    1116    0
>> 5557    1    1116    1
>> 5557    1    1116    0
>>
>> then you don't want the last record to be considered a duplicate, but
>> know when it changed back. If ROUTE_INDEX is just a running number, you
>> consider
>>
>> 5557    1    1116    0
>> 5557    2    1116    0
>>
>> to be duplicates, and there are no gaps between the ROUTE_INDEX for each
>> NODE, then the query can be simplified:
>>
>> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
>> from routes r
>> left join route_history rh1 on r.node = rh1.node
>>                           and r.route_index+1 = rh1.route_index
>>                           and r.link_node = rh1.link_node
>>                           and r.quality is not distinct from rh1.quality
>> where rh1.node is null
>>
>> Note that I use IS NOT DISTINCT so that NULLs are considered equal. If
>> you prefer, the left joins can be replaced by (nested) NOT EXISTS.
>>
>> HTH,
>> Set
>>
>> 2017-09-28 7:22 GMT+02:00 'Daniel Miller' [email protected]
>> [firebird-support] <[email protected]>:
>>
>>>
>>>
>>> On 9/27/2017 9:20:54 PM, "liviuslivius [email protected]
>>> [firebird-support]" <[email protected]> wrote:
>>>
>>>
>>> group by + having
>>> is your friend.
>>> E.g.
>>> Select field1, field2, count(*)
>>> From tablex
>>> Group by field1, field2
>>> Having count(*)>1
>>>
>>> But in your scenario i do not know if this is the solution. Question is
>>> if record was changed and in next update "restored" and once again changed
>>> is this duplicate or no?
>>> If yes above query is ok if no then procedure is your friend.
>>>
>>> And for the future change your audit trigger and check before if there
>>> was any change
>>>
>>> Thank you, but I this isn't quite what I need.  I need to match against
>>> all the records of a set (up to 8 rows per set) - not just individual rows.
>>>
>>> --
>>> Daniel
>>>
>>>
>>>
>>
>>
>> 
>>
>
>

Re: Re[4]: [firebird-support] Find duplicate multi-row entries

Reply via email to