Thanks to everyone who posted suggestions to help with our problem of 
duplications. In the end, we simply deleted the few duplicate resource 
records using the web user interface: there weren't as many duplicates as I 
first thought.

That left the duplicate resource-to-resource link records, of which we had 
over 15,000. Once I found that these are contained in the 
resource_x_resource database table, I checked them using the following SQL

SELECT resourceinstanceidfrom, resourceinstanceidto, COUNT(*)
FROM resource_x_resource
GROUP BY resourceinstanceidfrom, resourceinstanceidto
ORDER BY resourceinstanceidfrom, resourceinstanceidto;

and deleted the duplicates using

DELETE FROM resource_x_resource
WHERE resourcexid IN (
SELECT A.resourcexid
  FROM resource_x_resource A
 WHERE EXISTS (SELECT B.resourcexid
                 FROM resource_x_resource B
                WHERE B.resourceinstanceidfrom = A.resourceinstanceidfrom
                  AND B.resourceinstanceidto = A.resourceinstanceidto
                  AND B.resourcexid < A.resourcexid));

Having noticed that the relationshiptype field is simply text, I replaced 
all the incorrect 00000000-0000-0000-0000-000000000007 values with an 
appropriate CIDOC CRM property

UPDATE resource_x_resource SET relationshiptype = 
'http://www.cidoc-crm.org/cidoc-crm/P67_refers_to' WHERE relationshiptype=
'00000000-0000-0000-0000-000000000007';

Finally, a run of
python manage.py es index_resource_relations
made those changes visible in the resource reports in the user interface.

David

On Tuesday, 11 February 2020 23:12:54 UTC, David Osborne wrote:
>
> It seems I've inadvertently duplicated some resource records by 
> accidentally assigning them new uuids before loading from CSVs, so instead 
> of overwriting existing records, duplicates were created.
>
> I was wondering if anyone had developed any 'grep'-like command-line 
> utilities that perhaps returned uuids for resources where, say, a node 
> contains a particular value? If there were also a command to delete 
> resources given their uuid, that would help to remove the duplicate 
> records. If nothing like this exists, is there existing Python code in the 
> codebase which would make a good starting point? We're still on Arches 
> 4.4.2.
>
> A related question is whether it's possible to delete resource-to-resource 
> relationship records? They can obviously be loaded from CSV but there's 
> nothing documented which can remove them. What happens to these links when 
> one or more of the associated resource records are deleted?
>
> Finally, I've always been puzzled that in our resource reports, related 
> resources are followed by the uuid for the relationship 
> (00000000-0000-0000-0000-000000000007), rather than the CIDOC-CRM property 
> phrase ("is-related-to"or "refers-to", I forget which), although I see that 
> Lincoln's Arcade has the almost equally obscure URL for CRM property P67.
>
> Cheers
> David
>

-- 
-- To post, send email to [email protected]. To unsubscribe, send 
email to [email protected]. For more information, 
visit https://groups.google.com/d/forum/archesproject?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/archesproject/f59f4a68-2406-4307-83f6-78d23f6b53be%40googlegroups.com.

Reply via email to