Re: [Dspace-tech] data entry errors
Hi Bram, Cleaning up the current errors will have to be done, but I was more concerned with the prevention of future errors, as you deduced. (I do appreciate the information on the tools available for clean up though—Thanks!) I had the same idea that you mentioned below—hit the database for near matches and display a list to the user, allowing them to simply select the data from the list if they see what they are looking for. The reason that I bring it to the community is two-fold: Firstly, I highly doubt that I am the first person to come across this issue. I had hoped that someone had already developed a solution. There are so many different ideas, implementations, configurations, and patches out there that I would be a fool not to ask. Secondly, I am fairly new to DSpace, having only been working with it for a few weeks now (and most of my time has been spent doing high-level changes), so I don't know the code intimately yet. While I know how to code this solution on it's own, I am concerned about the possibility of side-effects if I simply start adding code/logic to the JSP without fully understanding the supporting code. At present, querying the database, displaying the result set, and [possibly] updating an input field does not seem like it would cause an issue, but I have been surprised in the past by making assumptions. In any case, I thank all of you for taking the time to consider my question and respond to it. Of the projects that I have worked on, this one definitely has the most helpful community I have ever seen. Good-day and be well. Darren Arsenault From: bluy...@gmail.com [bluy...@gmail.com] On Behalf Of Bram Luyten [b...@mire.be] Sent: August-30-12 3:06 AM To: DSpace @ Lyncode Cc: Darren Arsenault; dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] data entry errors Hi Darren, to be very clear: are you looking for a way to clean up the current errors, or just interested in prevention for new ones? In terms of prevention, it might help if you develop an auto-complete feature that tries to match anything a user is entering in a particular metadata field, with those values that are already stored for that field in archived items. Referring back to your example, this would mean that if someone starts typing AB... he or she would get suggestions for ways in which someone else has already entered values starting with AB for that specific metadata field. To deal with errors that already made it into your metadata, here are two suggestions, a free one, and a commercial add-on module from @mire: - Since DSpace 1.6 you can export metadata into spreadsheets on a per-collection basis. So download the metadata in a spreadsheet, clean it up, and re-upload to see the changes get into effect. For the clean up part, you can go at it with your spreadsheet editor but you might want to look at Google Refinehttp://code.google.com/p/google-refine/. It's really awesome at detecting similar values and grouping them together. - Our Metadata quality modulehttp://atmire.com/website/?q=modules/mqm has functionality for performing batch edits straight from the DSpace web UI and merging duplicates. cheers, Bram -- Bram Luyten @mire 2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010 Esperantolaan 4, Heverlee 3001, Belgium http://www.atmire.com/ www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml On Wed, Aug 29, 2012 at 8:32 PM, DSpace @ Lyncode dsp...@lyncode.commailto:dsp...@lyncode.com wrote: Hi, i can only think of implementing an Authority Control for that. Anyway, deposit workflow is meant to accomplish that task (validate/correct metadata values). On 29 August 2012 16:22, Darren Arsenault arse...@algonquincollege.commailto:arse...@algonquincollege.com wrote: I posted this a week ago and no one has responded yet, so I'm trying again: For input fields where it is not possible (or practical) to implement controlled vocabularies or drop down lists, is there a less labour-intensive way of preventing data entry errors? For example: The author of several documents is ABC Statistics Inc., but each document is added by a different ePerson,and each of these people makes a spelling error when filling out the AUTHOR field, so these items appear to have different authors. (ABC Statisitcs, Inc., ABC Statistics, Inc, ABC Statistics, etc.). Originally I thought that this would be a minor issue, easily correctable through raw SQL queries to update the offending fields. Unfortunately, my estimates as to the number of mistakes that would be made has proven to be extremely conservative. I do not want to be responsible for correcting so many entries myself, nor do I want to reject so many entries asking users to match the AUTHOR name that already exists. Does anyone have any ideas
Re: [Dspace-tech] data entry errors
On Thu, Aug 30, 2012 at 3:20 PM, Darren Arsenault arse...@algonquincollege.com wrote: Firstly, I highly doubt that I am the first person to come across this issue. I had hoped that someone had already developed a solution. There are so many different ideas, implementations, configurations, and patches out there that I would be a fool not to ask. You're right to ask first. It seems like a logical extension of existing functionality, a feature many people would be interested in. When you implement it, please make sure to submit your patch to our Jira [1]. I also wanted to draw you attention to the current development of Discovery for JSPUI. It's planned to be in DSpace 3.0, which is due before the end of this year. If I were you, I'd prefer talking to Solr instead of the database, it's faster and built for search (so you may forget LIKE). You may want to develop your improvements for the upcoming version and deploy it when it comes out. You can find the JSPUI Discovery branch here [2] and watch when it's merged into the master Git branch here [3]. The corresponding Jira ticket is here [4]. Another option is to use the XMLUI interface where this functionality already exists for submission (available only for your users internally) and the JSPUI interface for the public-facing repository (if you prefer). XMLUI and JSPUI can be deployed just fine in parallel on one DSpace instance, just on different URLs. [1] https://jira.duraspace.org/browse/ [2] https://github.com/abollini/DSpace/tree/DS-1217 [3] https://github.com/DSpace/DSpace/pull/60 [4] https://jira.duraspace.org/browse/DS-1217 Regards, ~~helix84 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] data entry errors
I posted this a week ago and no one has responded yet, so I'm trying again: For input fields where it is not possible (or practical) to implement controlled vocabularies or drop down lists, is there a less labour-intensive way of preventing data entry errors? For example: The author of several documents is ABC Statistics Inc., but each document is added by a different ePerson,and each of these people makes a spelling error when filling out the AUTHOR field, so these items appear to have different authors. (ABC Statisitcs, Inc., ABC Statistics, Inc, ABC Statistics, etc.). Originally I thought that this would be a minor issue, easily correctable through raw SQL queries to update the offending fields. Unfortunately, my estimates as to the number of mistakes that would be made has proven to be extremely conservative. I do not want to be responsible for correcting so many entries myself, nor do I want to reject so many entries asking users to match the AUTHOR name that already exists. Does anyone have any ideas? -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] data entry errors
Hi, i can only think of implementing an Authority Control for that. Anyway, deposit workflow is meant to accomplish that task (validate/correct metadata values). On 29 August 2012 16:22, Darren Arsenault arse...@algonquincollege.comwrote: I posted this a week ago and no one has responded yet, so I'm trying again: For input fields where it is not possible (or practical) to implement controlled vocabularies or drop down lists, is there a less labour-intensive way of preventing data entry errors? For example: The author of several documents is ABC Statistics Inc., but each document is added by a different ePerson,and each of these people makes a spelling error when filling out the AUTHOR field, so these items appear to have different authors. (ABC Statisitcs, Inc., ABC Statistics, Inc, ABC Statistics, etc.). Originally I thought that this would be a minor issue, easily correctable through raw SQL queries to update the offending fields. Unfortunately, my estimates as to the number of mistakes that would be made has proven to be extremely conservative. I do not want to be responsible for correcting so many entries myself, nor do I want to reject so many entries asking users to match the AUTHOR name that already exists. Does anyone have any ideas? -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Thanks, DSpace Department *Lyncode*: Official websitehttp://www.google.com/url?q=http%3A%2F%2Fwww.lyncode.com%2Fsa=Dsntz=1usg=AFrqEzdV8iS6rMxflxnn138XReuRfUG3OQ [image: Follow us on Facebook]http://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Flyncodesa=Dsntz=1usg=AFrqEzeDuT3ZqMW5uVIA8AoxtTtAeiCX3Q http://www.google.com/url?q=http%3A%2F%2Fwww.facebook.com%2Flyncodesa=Dsntz=1usg=AFrqEzcWXjHa3gKBGLsNVxktapxkiWDnww -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech