Just delete one of each 3 duplicates manually.

This happens to me regularly on a 5.x instance (I'm using LDAP auth)
and I couldn't figure out why. I started checking for this condition
each night, because a large number (low thousands) of such duplicate
entries appears overnight once every couple of days. The number then
seems to grow exponentially until it completely overwhelms DSpace
which then stops responding.

As a workaround, I'm running this script nightly from cron:

cleanup_eperson.sh:

#!/bin/sh

total=`psql -A -t -c "SELECT count(1) FROM epersongroup2eperson;"`
distinct=`psql -A -t -c "SELECT count(DISTINCT (eperson_group_id,
eperson_id)) FROM epersongroup2eperson;"`

# for testing only:
#total=$(($total+1))

echo $total
echo $distinct

if [ "$total" -gt "$distinct" ] ; then

        echo "Cleaning up..." | mail -s "digilib eperson table cleanup
($total/$distinct)" [email protected]

        report=`psql 2>&1 <<THE_END
                      SELECT MIN(ctid) as ctid, eperson_group_id, eperson_id
                        FROM epersongroup2eperson
                        GROUP BY eperson_group_id, eperson_id HAVING
COUNT(*) > 1;
THE_END`

        # perform the cleanup
        # capture the output and running time
        cleanuptime=`time psql 2>&1 <<THE_END
                DELETE FROM epersongroup2eperson a USING (
                      SELECT MIN(ctid) as ctid, eperson_group_id, eperson_id
                        FROM epersongroup2eperson
                        GROUP BY eperson_group_id, eperson_id HAVING
COUNT(*) > 1
                      ) b
                WHERE a.eperson_group_id = b.eperson_group_id
                AND a.eperson_id = b.eperson_id
                AND a.ctid <> b.ctid;
THE_END`

        # check again and report the status by email
        out=`psql <<THE_END
                SELECT count(1) FROM epersongroup2eperson;
                SELECT count(DISTINCT (eperson_group_id, eperson_id))
FROM epersongroup2eperson;
THE_END`
        echo "$report\n\n\n\n$cleanuptime\n\n$out" | mail -s "digilib
eperson table cleanup complete" [email protected]

        echo "email sent: $out"

else
        echo "" | mail -s "digilib eperson table clean" [email protected]
fi




Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to