Re[6]: [firebird-support] Find duplicate multi-row entries

'Daniel Miller' [email protected] [firebird-support] Thu, 28 Sep 2017 02:00:58 -0700

This appears no different than a simple:

    select * from routes

So...not there yet. I think what I'm fighting is the lack of pivot orcrosstab functionality - and I may have no choice (though it offends medeeply) but to setup my archive table for this comparison. Either viathe hated structure of column names ROUTE1, ROUTE2, ROUTE8...or the onlyslightly less offensive VARCHAR concatenation.

A concatenated LIST seems like it would be a elegant solution - andwhile operating over the whole table is time-consuming a single nodeisn't too bad. And if I can get the history filtered and old duplicaterecords deleted it would be fine. But the inconsistency is killing me.

There MUST be a Firebird-friendly, normalized, compact storage structurethat will work. I'm just not seeing it.

--
Daniel

On 9/28/2017 1:19:18 AM, "Svein Erling Tysvær [email protected][firebird-support]" <[email protected]> wrote:

Sorry again, Daniel, fourth query required (I forgot that things mayvary depending on time, this is something I've never done before):
with tmp(NODE, ROUTE_UPDATED) as
(select distinct rh.NODE, rh.ROUTE_UPDATED
 from route_history rh
join routes r on rh.NODE = r.NODE), //*This join is only forperformance reasons in case ROUTE_HISTORY contains a lot more recordsthan ROUTES and may be removed*/
new_routes(NODE) as
(select distinct t.NODE
 from tmp t
 where not exists(select *
                  from routes r
                  left join route_history rh on r.NODE = rh.NODE
and r.ROUTE_INDEX =rh.ROUTE_INDEXand r.LINK_NODE =rh.LINK_NODEand r.QUALITY is notdistinct from rh.QUALITYand t.ROUTE_UPDATED =rh.ROUTE_UPDATED
                  where t.NODE = r.NODE
                    and rh.NODE is null)
  and not exists(select *
                  from route_history rh
                  left join routes r on r.NODE = rh.NODE
and r.ROUTE_INDEX =rh.ROUTE_INDEXand r.LINK_NODE =rh.LINK_NODEand r.QUALITY is notdistinct from rh.QUALITY
                  where t.NODE = rh.NODE
                    and t.ROUTE_UPDATED = rh.ROUTE_UPDATED
                    and r.NODE is null))
select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from new_routes nr
join routes r on nr.NODE = r.NODE

Hope I finally got it right,
Set

2017-09-28 10:01 GMT+02:00 Svein Erling Tysvær <[email protected]>:
Sorry, I forgot that should probably use 'IS NOT DISTINCT' and not '='for QUALITY.
Note that this checks only checks if there has ever been anythingidentical in ROUTE_HISTORY, it doesn't restrain itself to the latestentry only (i.e. the query I've written is slightly similar to thesecond query I wrote, it has to be extended if you want somethingslightly similar to the first query). Also, my third query doesn'tconsider duplicate rows (since the fields seem to be the primary keyin ROUTE, it isn't theoretically possible in your example).
Set

2017-09-28 9:52 GMT+02:00 Svein Erling Tysvær <[email protected]>:
OK, that's completely different, but I've done something similar oncebefore. It is not possible to directly compare sets for equality, butit is possible to check that set A doesn't contain anything thatisn't in set B and that set B doesn't contain anything that isn't inset A. I assume NODE is the one common denominator:
with tmp (NODE) as
(select distinct NODE from routes),
new_routes(NODE) as
(select t.NODE
 from tmp t
 where not exists(select *
                  from routes r
                  left join route_history rh on r.NODE = rh.NODE
and r.ROUTE_INDEX =rh.ROUTE_INDEXand r.LINK_NODE =rh.LINK_NODEand r.QUALITY =rh.QUALITY
                  where t.NODE = r.NODE
                    and rh.NODE is null)
  and not exists(select *
                  from route_history rh
                  left join routes r on r.NODE = rh.NODE
and r.ROUTE_INDEX =rh.ROUTE_INDEXand r.LINK_NODE =rh.LINK_NODEand r.QUALITY =rh.QUALITY
                  where t.NODE = rh.NODE
                    and r.NODE is null))
select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from new_routes nr
join routes r on nr.NODE = r.NODE

Does this get you the result you want?
Set
2017-09-28 9:19 GMT+02:00 'Daniel Miller' [email protected][firebird-support] <[email protected]<mailto:[email protected]>>:
Thanks - but that still doesn't work. I get a huge return set witha ton of duplicates. I think something's being lost in translation.A simple match of one row isn't sufficient - I need to match the"set".
With the following six records:
5557    1    1116    0
5557    2    1115    0
5557    3    5104    2
5557    4    5073    2
5557    5    5222    2
5557    6    5110    2
That is the list of routes a given node has at a specific point intime. So it's perfectly possible at another time to have fewer, ormore, or different routes. And I need to consider ALL the routes inplay for a given timestamp to be a single "set". So if today I havethe above list of routes, and yesterday route index 4 was looking ata different node - that would constitute a different and unique seteven though the other 5 records match.
5557    1    1116    0
5557    2    1115    0
5557    3    5104    2
5557    4    9999    2
5557    5    5222    2
5557    6    5110    2
So it's quite possible I will have lots of duplicates for anindividual row in ROUTES when compared in ROUTE_HISTORY - what I'mtrying to filter is the pattern of rows. Only if ALL the rows of acurrent entries in ROUTES exist in ROUTE_HISTORY should it beconsidered a duplicate condition.
--
Daniel
On 9/27/2017 11:53:48 PM, "Svein Erling Tysvær [email protected][firebird-support]" <[email protected]<mailto:[email protected]>> wrote:
Sure it is possible to write such a query:

select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from routes r
left join route_history rh1 on r.node = rh1.node
                          and r.route_index = rh1.route_index
                          and r.link_node = rh1.link_node
and r.quality is not distinct fromrh1.quality
left join route_history rh2 on rh1.node = rh2.node
                          and rh1.route_index = rh2.route_index
                          and rh1.route_updated < rh2.route_updated
where rh1.node is null
  and rh2.node is null
I use two LEFT JOINS because I assume you want things to bereturned if things are changed back, e.g. if you have
5557    1    1116    0
5557    1    1116    0

then you only want one row in route_history, but if you have

5557    1    1116    0
5557    1    1116    1
5557    1    1116    0
then you don't want the last record to be considered a duplicate,but know when it changed back. If ROUTE_INDEX is just a runningnumber, you consider
5557    1    1116    0
5557    2    1116    0
to be duplicates, and there are no gaps between the ROUTE_INDEX foreach NODE, then the query can be simplified:
select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from routes r
left join route_history rh1 on r.node = rh1.node
                          and r.route_index+1 = rh1.route_index
                          and r.link_node = rh1.link_node
and r.quality is not distinct fromrh1.quality
where rh1.node is null
Note that I use IS NOT DISTINCT so that NULLs are considered equal.If you prefer, the left joins can be replaced by (nested) NOTEXISTS.
HTH,
Set
2017-09-28 7:22 GMT+02:00 'Daniel Miller' [email protected][firebird-support] <[email protected]<mailto:[email protected]>>:
On 9/27/2017 9:20:54 PM, "liviuslivius [email protected][firebird-support]" <[email protected]<mailto:[email protected]>> wrote:
group by + having
is your friend.
E.g.
Select field1, field2, count(*)
From tablex
Group by field1, field2
Having count(*)>1
But in your scenario i do not know if this is the solution.Question is if record was changed and in next update "restored"and once again changed is this duplicate or no?
If yes above query is ok if no then procedure is your friend.
And for the future change your audit trigger and check before ifthere was any change
Thank you, but I this isn't quite what I need. I need to matchagainst all the records of a set (up to 8 rows per set) - not justindividual rows.
--
Daniel

Re[6]: [firebird-support] Find duplicate multi-row entries

Reply via email to