Problem resolved, thanks helix84.

Running the query SELECT * FROM metadatavalue,metadatafieldregistry WHERE 
metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND 
element = 'identifier' AND qualifier = 'uri' AND text_lang = ''; resulted in 
zero hits, but

Running the query SELECT * FROM metadatavalue,metadatafieldregistry WHERE 
metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND 
element = 'identifier' AND qualifier = 'uri' AND text_lang is NULL; came back 
with four hits.

I looked up those item numbers in the UI and indeed there were fields where the 
value of the field AND the text lang were both NULL. I deleted the fields 
through the UI, re-exported the metadata, and the extra fields were gone.

David


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of helix84
Sent: Monday, June 25, 2012 12:47 PM
To: Isaak, David C
Cc: [email protected]
Subject: Re: [Dspace-tech] Duplicate metadata fields on export

On Mon, Jun 25, 2012 at 8:42 PM, Isaak, David C <[email protected]> wrote:
> The collection has about 3000 items in it, so I cannot use the UI BME 
> on the whole collection,

You can raise the limit (for UI) in [dspace]/config/modules/bulkedit.cfg
But I would recommend you to use the command line version of BME 
([dspace]/bin/dspace metadata-import) to which this limitation doesn't apply.

> but if I use Item Update through the CLI interface, won't the [en_US] be 
> added automatically?

I don't know, I don't use that tool much.

> SELECT text_lang FROM metadatavalue GROUP BY text_lang; ""
> ""
> "en"
> "en_US"

This looks a bit weird because it shows the empty string twice, but judging 
from the following query I assume one of them is only how your SQL client 
displays NULL.

> SELECT count(*) FROM metadatavalue WHERE text_lang IS NULL;
> 25569

> SELECT count(*) FROM metadatavalue WHERE text_lang = '';
> 574

So you have both types. The latter is probably the "xyz" case.

> Are these numbers counting metadata fields that do not usually have language 
> codes in text_lang such as dc.identifier.uri, dc.date.accessioned, and 
> dc.date.available?

That's correct. Here's how you would find only the values of a certain field 
(here dc.identifier.uri):

SELECT * FROM metadatavalue,metadatafieldregistry WHERE 
metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND 
element = 'identifier' AND qualifier = 'uri' AND text_lang = '';

Regards,
~~helix84
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to