Problem resolved, thanks helix84. Running the query SELECT * FROM metadatavalue,metadatafieldregistry WHERE metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND element = 'identifier' AND qualifier = 'uri' AND text_lang = ''; resulted in zero hits, but
Running the query SELECT * FROM metadatavalue,metadatafieldregistry WHERE metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND element = 'identifier' AND qualifier = 'uri' AND text_lang is NULL; came back with four hits. I looked up those item numbers in the UI and indeed there were fields where the value of the field AND the text lang were both NULL. I deleted the fields through the UI, re-exported the metadata, and the extra fields were gone. David -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of helix84 Sent: Monday, June 25, 2012 12:47 PM To: Isaak, David C Cc: [email protected] Subject: Re: [Dspace-tech] Duplicate metadata fields on export On Mon, Jun 25, 2012 at 8:42 PM, Isaak, David C <[email protected]> wrote: > The collection has about 3000 items in it, so I cannot use the UI BME > on the whole collection, You can raise the limit (for UI) in [dspace]/config/modules/bulkedit.cfg But I would recommend you to use the command line version of BME ([dspace]/bin/dspace metadata-import) to which this limitation doesn't apply. > but if I use Item Update through the CLI interface, won't the [en_US] be > added automatically? I don't know, I don't use that tool much. > SELECT text_lang FROM metadatavalue GROUP BY text_lang; "" > "" > "en" > "en_US" This looks a bit weird because it shows the empty string twice, but judging from the following query I assume one of them is only how your SQL client displays NULL. > SELECT count(*) FROM metadatavalue WHERE text_lang IS NULL; > 25569 > SELECT count(*) FROM metadatavalue WHERE text_lang = ''; > 574 So you have both types. The latter is probably the "xyz" case. > Are these numbers counting metadata fields that do not usually have language > codes in text_lang such as dc.identifier.uri, dc.date.accessioned, and > dc.date.available? That's correct. Here's how you would find only the values of a certain field (here dc.identifier.uri): SELECT * FROM metadatavalue,metadatafieldregistry WHERE metadatavalue.metadata_field_id = metadatafieldregistry.metadata_field_id AND element = 'identifier' AND qualifier = 'uri' AND text_lang = ''; Regards, ~~helix84 ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

