Hi Allen, Thanks for all the help. I implemented your logic and fixed the issue.
I wrote a java program to first execute the query myquery="select distinct m1.metadata_value_id,m2.metadata_value_id,m1.item_id,m2.item_id from metadatavalue m1, metadatavalue m2 where m1.item_id=m2.item_id and m1.metadata_field_id=m2.metadata_field_id and m1.text_value=m2.text_value and m1.place=m2.place and m1.metadata_value_id!=m2.metadata_value_id and m1.item_id="+itemindex+";"; and stored the results in a text file. [Getting only metadata_value_ids would do but other values help in manual verification.] Next I read each line and selected the higher of the metadata_value_id. Put these values in a TreeSet to get rid of duplicates and also sort them though sorting is not required. However it helped me to manually examine the results. Finally read these unique values form the TreeSet and delete corresponding entries in the metadatavalue table. [The results from previous steps should be checked and re-checked to make sure the right entries are deleted.] Thanks again, Koushik On Wed, Aug 4, 2010 at 3:47 AM, Allen Lam <[email protected]> wrote: > Hi Koushik, > > I usually like to separate queries of inspecting data from queries of > deleting data, for the sake of playing safe. > > Try this. > > select distinct * > from metadatavalue m1, metadatavalue m2 > where m1.item_id=m2.item_id > and m1.metadata_field_id=m2.metadata_field_id > and m1.text_value=m2.text_value > and m1.metadata_value_id!=m2.metadata_value_id > order by m1.metadata_field_id > > > I would run this query in a program. > for each result row, > compare the m1.place and m2.place values, > then write down, to file, the metadata_value_id that carries a bigger place > value > > There shall be duplication of IDs in the list. > > In a separate operation, > read IDs from the file, > put IDs in a Set (or similar hash structure) to filter away duplicates. > Finally, for each unique to-delete ID, > delete from metadatavalue where metadata_value_id=[xxx] > > I coined out the above query in a coffee time. It is not tested in the > field. Someone can improve it to prevent duplicate outputs. > > You may like to fully check the list before executing delete, to assure you > are not killing something that deserve to stay. > > > Best, > Allen Lam. > HKU Scholars Hub Administrator, http://hub.hku.hk > > > > On 2010-08-04 1:14 AM, Koushik Banerjee wrote: > > Hello Allen, > Thanks a lot for the guide. Here is what further investigation has > unearthed -> > 1> This duplicate metadata issue doesn't occur with new submissions. For > this installation, the DB was copied from an earlier version. It is only > with these old items that the duplicate metadata entries show up. > 2> The problem is in the DB itself. In the metadatavalue table, for each > item_id there are duplicate entries for each metadata. It looks something > like this - > > >> select * from metadatavalue where item_id=1000; > > metadata_value_id | item_id | metadata_field_id | > text_value | text_lang > | place | authority | confidence > 988 | 1000 | 15 | 2001-12 > | > | 1 | | -1 > 989 | 1000 | 26 | > Rescission\r > | en | 1 | | -1 > .... > .... > 5204 | 1000 | 15 | 2001-12 > | > | 1 | | -1 > 7654 | 1000 | 26 | Rescission\r > | en > | 1 | | -1 > ... > ... > > Do you have a suggestion exactly how this can be solved? I am not very > comfortable with DBs and do not want to try something arbitrary and > completely mess it up. Or may be if you can refer a particular resource on > the net, going through which will help me write my own sql queries to get > rid of this problem. > > Thanks again, > Koushik > > On Thu, Jul 29, 2010 at 11:40 PM, Allen Lam <[email protected]>wrote: > > Hi Koushik, > > The immediate task is to find out why there are duplicate entries in the > db, and stop creating duplication again. > > I don't know your installation or customization history, so this is a wild > guess again. There could be some errors made in your item submission form. > Check the item submission xml under config. If it is not the cause of the > problem..., I don't know. Please look back what changes you've made to the > system recently. > > OK. The next step is to remove the duplicate entries in the database. > Either you delete it manually in the edit item page one-by-one, or you do it > in the database level, do some SQL queries and write some programs to > automate the de-dup and delete process. Luckily you only have to deal with > one db table named 'metadatavalue'. > > Hope this rough guide could kick start your rescue process. > > > Best, > Allen Lam. > HKU Scholars Hub Administrator, http://hub.hku.hk > > > > On 2010-07-30 3:46 AM, Koushik Banerjee wrote: > > Thanks Allen. > I checked the "edit item" page. That also has duplicate entries i.e. two > dc.contributor.author, two dc.date.issued fields etc which can be edited. > Does it mean that the database itself is wrong? In that case, how do I get > rid of this problem? > > Thanks > > On Thu, Jul 29, 2010 at 1:30 PM, Allen Lam <[email protected]> wrote: > > Hi Koushik, > > First try to make sure there is no duplication in the database. > Go into the Edit item page to check. > > If the problem is not with the db, it may be caused by incorrect indexing. > There could be some errors in your config file. Without extra information > it is only a wild guess. > But remember that any changes to indexing config need a re-indexing > afterward to see the effect. > > > Best, > Allen Lam. > HKU Scholars Hub Administrator, http://hub.hku.hk > > > > On 2010-07-30 1:19 AM, Koushik Banerjee wrote: > > The same thing happens in both JSPUI and XMLUI. I guess that means the > problem is not at the UI layer but something deeper. The Browse or Handle > servlets? > > This dual-display happens while viewing a particular item i.e. url like > [host]/[dspace or xmlui]/handle/number1/number2 > also the author names are displayed twice when i browse using titles with > an url [host]/[dspace or xmlui]/browse?type=title > > Any help is much appreciated. > > On Tue, Jul 27, 2010 at 5:47 PM, Koushik Banerjee < > [email protected]> wrote: > > Hello, I am trying out a DSpace 1.6 installation on postgresql. When I view > an item, all metadata is displayed twice. It looks something like this > Title: This is the sample title > This is the sample title > Authors: Author 1 > Author 1 > Author 2 > Author 2 > Issue Date: 21-May-2008 > 21-May-2008 > > etc etc. Anyone any idea. At what level can this problem be? > > Thanks. > > > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > > > _______________________________________________ > DSpace-tech mailing > [email protected]https://lists.sourceforge.net/lists/listinfo/dspace-tech > > > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > > > > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > > > > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > >
------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

