Hi Koushik,

I usually like to separate queries of inspecting data from queries of deleting data, for the sake of playing safe.

Try this.

select distinct *
from metadatavalue m1, metadatavalue m2
where m1.item_id=m2.item_id
and m1.metadata_field_id=m2.metadata_field_id
and m1.text_value=m2.text_value
and m1.metadata_value_id!=m2.metadata_value_id
order by m1.metadata_field_id


I would run this query in a program.
for each result row,
compare the m1.place and m2.place values,
then write down, to file, the metadata_value_id that carries a bigger place value

There shall be duplication of IDs in the list.

In a separate operation,
read IDs from the file,
put IDs in a Set (or similar hash structure) to filter away duplicates.
Finally, for each unique to-delete ID,
delete from metadatavalue where metadata_value_id=[xxx]

I coined out the above query in a coffee time. It is not tested in the field. Someone can improve it to prevent duplicate outputs.

You may like to fully check the list before executing delete, to assure you are not killing something that deserve to stay.

Best,
Allen Lam.
HKU Scholars Hub Administrator, http://hub.hku.hk



On 2010-08-04 1:14 AM, Koushik Banerjee wrote:
Hello Allen,
Thanks a lot for the guide. Here is what further investigation has unearthed -> 1> This duplicate metadata issue doesn't occur with new submissions. For this installation, the DB was copied from an earlier version. It is only with these old items that the duplicate metadata entries show up. 2> The problem is in the DB itself. In the metadatavalue table, for each item_id there are duplicate entries for each metadata. It looks something like this -

>> select * from metadatavalue where item_id=1000;

metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence 988 | 1000 | 15 | 2001-12 | | 1 | | -1 989 | 1000 | 26 | Rescission\r | en | 1 | | -1
....
....
5204 | 1000 | 15 | 2001-12 | | 1 | | -1 7654 | 1000 | 26 | Rescission\r | en | 1 | | -1
...
...

Do you have a suggestion exactly how this can be solved? I am not very comfortable with DBs and do not want to try something arbitrary and completely mess it up. Or may be if you can refer a particular resource on the net, going through which will help me write my own sql queries to get rid of this problem.

Thanks again,
Koushik

On Thu, Jul 29, 2010 at 11:40 PM, Allen Lam <allen.dsp...@gmail.com <mailto:allen.dsp...@gmail.com>> wrote:

    Hi Koushik,

    The immediate task is to find out why there are duplicate entries
    in the db, and stop creating duplication again.

    I don't know your installation or customization history, so this
    is a wild guess again. There could be some errors made in your
    item submission form. Check the item submission xml under config.
    If it is not the cause of the problem..., I don't know. Please
    look back what changes you've made to the system recently.

    OK. The next step is to remove the duplicate entries in the
    database. Either you delete it manually in the edit item page
    one-by-one, or you do it in the database level, do some SQL
    queries and write some programs to automate the de-dup and delete
    process. Luckily you only have to deal with one db table named
    'metadatavalue'.

    Hope this rough guide could kick start your rescue process.


    Best,
    Allen Lam.
    HKU Scholars Hub Administrator, http://hub.hku.hk



    On 2010-07-30 3:46 AM, Koushik Banerjee wrote:
    Thanks Allen.
    I checked the "edit item" page. That also has duplicate entries
    i.e. two dc.contributor.author, two dc.date.issued fields etc
    which can be edited. Does it mean that the database itself is
    wrong? In that case, how do I get rid of this problem?

    Thanks

    On Thu, Jul 29, 2010 at 1:30 PM, Allen Lam
    <allen.dsp...@gmail.com <mailto:allen.dsp...@gmail.com>> wrote:

        Hi Koushik,

        First try to make sure there is no duplication in the database.
        Go into the Edit item page to check.

        If the problem is not with the db, it may be caused by
        incorrect indexing.
        There could be some errors in your config file. Without extra
        information it is only a wild guess.
        But remember that any changes to indexing config need a
        re-indexing afterward to see the effect.


        Best,
        Allen Lam.
        HKU Scholars Hub Administrator, http://hub.hku.hk



        On 2010-07-30 1:19 AM, Koushik Banerjee wrote:
        The same thing happens in both JSPUI and XMLUI. I guess that
        means the problem is not at the UI layer but something
        deeper. The Browse or Handle servlets?

        This dual-display happens while viewing a particular item
        i.e. url like [host]/[dspace or xmlui]/handle/number1/number2
        also the author names are displayed twice when i browse
        using titles with an url [host]/[dspace or
        xmlui]/browse?type=title

        Any help is much appreciated.

        On Tue, Jul 27, 2010 at 5:47 PM, Koushik Banerjee
        <koushik.baner...@gmail.com
        <mailto:koushik.baner...@gmail.com>> wrote:

            Hello, I am trying out a DSpace 1.6 installation
            on postgresql. When I view an item, all metadata is
            displayed twice. It looks something like this
            Title: This is the sample title
                    This is the sample title
            Authors: Author 1
                         Author 1
                         Author 2
                         Author 2
            Issue Date: 21-May-2008
                              21-May-2008

            etc etc. Anyone any idea. At what level can this problem be?

            Thanks.



        
------------------------------------------------------------------------------
        The Palm PDK Hot Apps Program offers developers who use the
        Plug-In Development Kit to bring their C/C++ apps to Palm for a share
        of $1 Million in cash or HP Products. Visit us here for more details:
        http://p.sf.net/sfu/dev2dev-palm

        _______________________________________________
        DSpace-tech mailing list
        DSpace-tech@lists.sourceforge.net  
<mailto:DSpace-tech@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dspace-tech

        
------------------------------------------------------------------------------
        The Palm PDK Hot Apps Program offers developers who use the
        Plug-In Development Kit to bring their C/C++ apps to Palm for
        a share
        of $1 Million in cash or HP Products. Visit us here for more
        details:
        http://p.sf.net/sfu/dev2dev-palm
        _______________________________________________
        DSpace-tech mailing list
        DSpace-tech@lists.sourceforge.net
        <mailto:DSpace-tech@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dspace-tech



    
------------------------------------------------------------------------------
    The Palm PDK Hot Apps Program offers developers who use the
    Plug-In Development Kit to bring their C/C++ apps to Palm for a share
    of $1 Million in cash or HP Products. Visit us here for more details:
    http://p.sf.net/sfu/dev2dev-palm
    _______________________________________________
    DSpace-tech mailing list
    DSpace-tech@lists.sourceforge.net
    <mailto:DSpace-tech@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to