Hi Koushik,
I usually like to separate queries of inspecting data from queries of
deleting data, for the sake of playing safe.
Try this.
select distinct *
from metadatavalue m1, metadatavalue m2
where m1.item_id=m2.item_id
and m1.metadata_field_id=m2.metadata_field_id
and m1.text_value=m2.text_value
and m1.metadata_value_id!=m2.metadata_value_id
order by m1.metadata_field_id
I would run this query in a program.
for each result row,
compare the m1.place and m2.place values,
then write down, to file, the metadata_value_id that carries a bigger
place value
There shall be duplication of IDs in the list.
In a separate operation,
read IDs from the file,
put IDs in a Set (or similar hash structure) to filter away duplicates.
Finally, for each unique to-delete ID,
delete from metadatavalue where metadata_value_id=[xxx]
I coined out the above query in a coffee time. It is not tested in the
field. Someone can improve it to prevent duplicate outputs.
You may like to fully check the list before executing delete, to assure
you are not killing something that deserve to stay.
Best,
Allen Lam.
HKU Scholars Hub Administrator, http://hub.hku.hk
On 2010-08-04 1:14 AM, Koushik Banerjee wrote:
Hello Allen,
Thanks a lot for the guide. Here is what further investigation has
unearthed ->
1> This duplicate metadata issue doesn't occur with new submissions.
For this installation, the DB was copied from an earlier version. It
is only with these old items that the duplicate metadata entries show up.
2> The problem is in the DB itself. In the metadatavalue table, for
each item_id there are duplicate entries for each metadata. It looks
something like this -
>> select * from metadatavalue where item_id=1000;
metadata_value_id | item_id | metadata_field_id |
text_value |
text_lang | place | authority | confidence
988 | 1000 | 15 | 2001-12
| | 1 | | -1
989 | 1000 | 26 |
Rescission\r
| en | 1 | | -1
....
....
5204 | 1000 | 15 | 2001-12
| | 1 | | -1
7654 | 1000 | 26 |
Rescission\r
| en | 1 | | -1
...
...
Do you have a suggestion exactly how this can be solved? I am not very
comfortable with DBs and do not want to try something arbitrary and
completely mess it up. Or may be if you can refer a particular
resource on the net, going through which will help me write my own sql
queries to get rid of this problem.
Thanks again,
Koushik
On Thu, Jul 29, 2010 at 11:40 PM, Allen Lam <allen.dsp...@gmail.com
<mailto:allen.dsp...@gmail.com>> wrote:
Hi Koushik,
The immediate task is to find out why there are duplicate entries
in the db, and stop creating duplication again.
I don't know your installation or customization history, so this
is a wild guess again. There could be some errors made in your
item submission form. Check the item submission xml under config.
If it is not the cause of the problem..., I don't know. Please
look back what changes you've made to the system recently.
OK. The next step is to remove the duplicate entries in the
database. Either you delete it manually in the edit item page
one-by-one, or you do it in the database level, do some SQL
queries and write some programs to automate the de-dup and delete
process. Luckily you only have to deal with one db table named
'metadatavalue'.
Hope this rough guide could kick start your rescue process.
Best,
Allen Lam.
HKU Scholars Hub Administrator, http://hub.hku.hk
On 2010-07-30 3:46 AM, Koushik Banerjee wrote:
Thanks Allen.
I checked the "edit item" page. That also has duplicate entries
i.e. two dc.contributor.author, two dc.date.issued fields etc
which can be edited. Does it mean that the database itself is
wrong? In that case, how do I get rid of this problem?
Thanks
On Thu, Jul 29, 2010 at 1:30 PM, Allen Lam
<allen.dsp...@gmail.com <mailto:allen.dsp...@gmail.com>> wrote:
Hi Koushik,
First try to make sure there is no duplication in the database.
Go into the Edit item page to check.
If the problem is not with the db, it may be caused by
incorrect indexing.
There could be some errors in your config file. Without extra
information it is only a wild guess.
But remember that any changes to indexing config need a
re-indexing afterward to see the effect.
Best,
Allen Lam.
HKU Scholars Hub Administrator, http://hub.hku.hk
On 2010-07-30 1:19 AM, Koushik Banerjee wrote:
The same thing happens in both JSPUI and XMLUI. I guess that
means the problem is not at the UI layer but something
deeper. The Browse or Handle servlets?
This dual-display happens while viewing a particular item
i.e. url like [host]/[dspace or xmlui]/handle/number1/number2
also the author names are displayed twice when i browse
using titles with an url [host]/[dspace or
xmlui]/browse?type=title
Any help is much appreciated.
On Tue, Jul 27, 2010 at 5:47 PM, Koushik Banerjee
<koushik.baner...@gmail.com
<mailto:koushik.baner...@gmail.com>> wrote:
Hello, I am trying out a DSpace 1.6 installation
on postgresql. When I view an item, all metadata is
displayed twice. It looks something like this
Title: This is the sample title
This is the sample title
Authors: Author 1
Author 1
Author 2
Author 2
Issue Date: 21-May-2008
21-May-2008
etc etc. Anyone any idea. At what level can this problem be?
Thanks.
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for
a share
of $1 Million in cash or HP Products. Visit us here for more
details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech