Thanks to everyone who posted suggestions to help with our problem of
duplications. In the end, we simply deleted the few duplicate resource
records using the web user interface: there weren't as many duplicates as I
first thought.
That left the duplicate resource-to-resource link records, of which we had
over 15,000. Once I found that these are contained in the
resource_x_resource database table, I checked them using the following SQL
SELECT resourceinstanceidfrom, resourceinstanceidto, COUNT(*)
FROM resource_x_resource
GROUP BY resourceinstanceidfrom, resourceinstanceidto
ORDER BY resourceinstanceidfrom, resourceinstanceidto;
and deleted the duplicates using
DELETE FROM resource_x_resource
WHERE resourcexid IN (
SELECT A.resourcexid
FROM resource_x_resource A
WHERE EXISTS (SELECT B.resourcexid
FROM resource_x_resource B
WHERE B.resourceinstanceidfrom = A.resourceinstanceidfrom
AND B.resourceinstanceidto = A.resourceinstanceidto
AND B.resourcexid < A.resourcexid));
Having noticed that the relationshiptype field is simply text, I replaced
all the incorrect 00000000-0000-0000-0000-000000000007 values with an
appropriate CIDOC CRM property
UPDATE resource_x_resource SET relationshiptype =
'http://www.cidoc-crm.org/cidoc-crm/P67_refers_to' WHERE relationshiptype=
'00000000-0000-0000-0000-000000000007';
Finally, a run of
python manage.py es index_resource_relations
made those changes visible in the resource reports in the user interface.
David
On Tuesday, 11 February 2020 23:12:54 UTC, David Osborne wrote:
>
> It seems I've inadvertently duplicated some resource records by
> accidentally assigning them new uuids before loading from CSVs, so instead
> of overwriting existing records, duplicates were created.
>
> I was wondering if anyone had developed any 'grep'-like command-line
> utilities that perhaps returned uuids for resources where, say, a node
> contains a particular value? If there were also a command to delete
> resources given their uuid, that would help to remove the duplicate
> records. If nothing like this exists, is there existing Python code in the
> codebase which would make a good starting point? We're still on Arches
> 4.4.2.
>
> A related question is whether it's possible to delete resource-to-resource
> relationship records? They can obviously be loaded from CSV but there's
> nothing documented which can remove them. What happens to these links when
> one or more of the associated resource records are deleted?
>
> Finally, I've always been puzzled that in our resource reports, related
> resources are followed by the uuid for the relationship
> (00000000-0000-0000-0000-000000000007), rather than the CIDOC-CRM property
> phrase ("is-related-to"or "refers-to", I forget which), although I see that
> Lincoln's Arcade has the almost equally obscure URL for CRM property P67.
>
> Cheers
> David
>
--
-- To post, send email to [email protected]. To unsubscribe, send
email to [email protected]. For more information,
visit https://groups.google.com/d/forum/archesproject?hl=en
---
You received this message because you are subscribed to the Google Groups
"Arches Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/archesproject/f59f4a68-2406-4307-83f6-78d23f6b53be%40googlegroups.com.