Re: Re[4]: [firebird-support] Find duplicate multi-row entries

Svein Erling Tysvær [email protected] [firebird-support] Thu, 28 Sep 2017 02:49:27 -0700

OK, that's completely different, but I've done something similar once
before. It is not possible to directly compare sets for equality, but it is
possible to check that set A doesn't contain anything that isn't in set B
and that set B doesn't contain anything that isn't in set A. I assume NODE
is the one common denominator:


with tmp (NODE) as
(select distinct NODE from routes),
new_routes(NODE) as
(select t.NODE
 from tmp t
 where not exists(select *
                  from routes r
                  left join route_history rh on r.NODE = rh.NODE
                                            and r.ROUTE_INDEX =
rh.ROUTE_INDEX
                                            and r.LINK_NODE = rh.LINK_NODE
                                            and r.QUALITY = rh.QUALITY
                  where t.NODE = r.NODE
                    and rh.NODE is null)
  and not exists(select *
                  from route_history rh
                  left join routes r on r.NODE = rh.NODE
                                            and r.ROUTE_INDEX =
rh.ROUTE_INDEX
                                            and r.LINK_NODE = rh.LINK_NODE
                                            and r.QUALITY = rh.QUALITY
                  where t.NODE = rh.NODE
                    and r.NODE is null))
select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from new_routes nr
join routes r on nr.NODE = r.NODE

Does this get you the result you want?
Set

2017-09-28 9:19 GMT+02:00 'Daniel Miller' [email protected]
[firebird-support] <[email protected]>:

>
>
> Thanks - but that still doesn't work.  I get a huge return set with a ton
> of duplicates.  I think something's being lost in translation.  A simple
> match of one row isn't sufficient - I need to match the "set".
>
> With the following six records:
> 5557    1    1116    0
> 5557    2    1115    0
> 5557    3    5104    2
> 5557    4    5073    2
> 5557    5    5222    2
> 5557    6    5110    2
>
> That is the list of routes a given node has at a specific point in time.
> So it's perfectly possible at another time to have fewer, or more, or
> different routes.  And I need to consider ALL the routes in play for a
> given timestamp to be a single "set".  So if today I have the above list of
> routes, and yesterday route index 4 was looking at a different node - that
> would constitute a different and unique set even though the other 5 records
> match.
>
> 5557    1    1116    0
> 5557    2    1115    0
> 5557    3    5104    2
> 5557    4    9999    2
> 5557    5    5222    2
> 5557    6    5110    2
>
> So it's quite possible I will have lots of duplicates for an individual
> row in ROUTES when compared in ROUTE_HISTORY - what I'm trying to filter is
> the pattern of rows.  Only if ALL the rows of a current entries in ROUTES
> exist in ROUTE_HISTORY should it be considered a duplicate condition.
>
> --
> Daniel
>
> On 9/27/2017 11:53:48 PM, "Svein Erling Tysvær [email protected]
> [firebird-support]" <[email protected]> wrote:
>
> Sure it is possible to write such a query:
>
> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
> from routes r
> left join route_history rh1 on r.node = rh1.node
>                           and r.route_index = rh1.route_index
>                           and r.link_node = rh1.link_node
>                           and r.quality is not distinct from rh1.quality
> left join route_history rh2 on rh1.node = rh2.node
>                           and rh1.route_index = rh2.route_index
>                           and rh1.route_updated < rh2.route_updated
> where rh1.node is null
>   and rh2.node is null
>
> I use two LEFT JOINS because I assume you want things to be returned if
> things are changed back, e.g. if you have
>
> 5557    1    1116    0
> 5557    1    1116    0
>
> then you only want one row in route_history, but if you have
>
> 5557    1    1116    0
> 5557    1    1116    1
> 5557    1    1116    0
>
> then you don't want the last record to be considered a duplicate, but know
> when it changed back. If ROUTE_INDEX is just a running number, you consider
>
> 5557    1    1116    0
> 5557    2    1116    0
>
> to be duplicates, and there are no gaps between the ROUTE_INDEX for each
> NODE, then the query can be simplified:
>
> select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
> from routes r
> left join route_history rh1 on r.node = rh1.node
>                           and r.route_index+1 = rh1.route_index
>                           and r.link_node = rh1.link_node
>                           and r.quality is not distinct from rh1.quality
> where rh1.node is null
>
> Note that I use IS NOT DISTINCT so that NULLs are considered equal. If you
> prefer, the left joins can be replaced by (nested) NOT EXISTS.
>
> HTH,
> Set
>
> 2017-09-28 7:22 GMT+02:00 'Daniel Miller' [email protected]
> [firebird-support] <[email protected]>:
>
>>
>>
>> On 9/27/2017 9:20:54 PM, "liviuslivius [email protected]
>> [firebird-support]" <[email protected]> wrote:
>>
>>
>> group by + having
>> is your friend.
>> E.g.
>> Select field1, field2, count(*)
>> From tablex
>> Group by field1, field2
>> Having count(*)>1
>>
>> But in your scenario i do not know if this is the solution. Question is
>> if record was changed and in next update "restored" and once again changed
>> is this duplicate or no?
>> If yes above query is ok if no then procedure is your friend.
>>
>> And for the future change your audit trigger and check before if there
>> was any change
>>
>> Thank you, but I this isn't quite what I need.  I need to match against
>> all the records of a set (up to 8 rows per set) - not just individual rows.
>>
>> --
>> Daniel
>>
>>
>>
>
>
> 
>

Re: Re[4]: [firebird-support] Find duplicate multi-row entries

Reply via email to