Re: [Dspace-tech] data entry errors

2012-08-30 Thread Darren Arsenault
Hi Bram,

Cleaning up the current errors will have to be done, but I was more concerned 
with the prevention of future errors, as you deduced. (I do appreciate the 
information on the tools available for clean up though—Thanks!)

I had the same idea that you mentioned below—hit the database for near matches 
and display a list to the user, allowing them to simply select the data from 
the list if they see what they are looking for. The reason that I bring it to 
the community is two-fold:

Firstly, I highly doubt that I am the first person to come across this issue. I 
had hoped that someone had already developed a solution. There are so many 
different ideas, implementations, configurations, and patches out there that I 
would be a fool not to ask.
Secondly, I am fairly new to DSpace, having only been working with it for a few 
weeks now (and most of my time has been spent doing high-level changes), so I 
don't know the code intimately yet. While I know how to code this solution on 
it's own, I am concerned about the possibility of side-effects if I simply 
start adding code/logic to the JSP without fully understanding the supporting 
code. At present, querying the database, displaying the result set, and 
[possibly] updating an input field does not seem like it would cause an issue, 
but I have been surprised in the past by making assumptions.

In any case, I thank all of you for taking the time to consider my question and 
respond to it. Of the projects that I have worked on, this one definitely has 
the most helpful community I have ever seen.

Good-day and be well.

Darren Arsenault


From: bluy...@gmail.com [bluy...@gmail.com] On Behalf Of Bram Luyten 
[b...@mire.be]
Sent: August-30-12 3:06 AM
To: DSpace @ Lyncode
Cc: Darren Arsenault; dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] data entry errors

Hi Darren,

to be very clear: are you looking for a way to clean up the current errors, or 
just interested in prevention for new ones? In terms of prevention, it might 
help if you develop an auto-complete feature that tries to match anything a 
user is entering in a particular metadata field, with those values that are 
already stored for that field in archived items.

Referring back to your example, this would mean that if someone starts typing 
AB... he or she would get suggestions for ways in which someone else has 
already entered values starting with AB for that specific metadata field.

To deal with errors that already made it into your metadata, here are two 
suggestions, a free one, and a commercial add-on module from @mire:

- Since DSpace 1.6 you can export metadata into spreadsheets on a 
per-collection basis. So download the metadata in a spreadsheet, clean it up, 
and re-upload to see the changes get into effect. For the clean up part, you 
can go at it with your spreadsheet editor but you might want to look at Google 
Refinehttp://code.google.com/p/google-refine/. It's really awesome at 
detecting similar values and grouping them together.

- Our Metadata quality modulehttp://atmire.com/website/?q=modules/mqm has 
functionality for performing batch edits straight from the DSpace web UI and 
merging duplicates.

cheers,

Bram

--

Bram Luyten @mire
2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010
Esperantolaan 4, Heverlee 3001, Belgium
 http://www.atmire.com/ 
www.atmire.comhttp://atmire.com/website/?q=servicesutm_source=emailfooterutm_medium=emailutm_campaign=braml



On Wed, Aug 29, 2012 at 8:32 PM, DSpace @ Lyncode 
dsp...@lyncode.commailto:dsp...@lyncode.com wrote:
Hi,

i can only think of implementing an Authority Control for that.
Anyway, deposit workflow is meant to accomplish that task (validate/correct 
metadata values).

On 29 August 2012 16:22, Darren Arsenault 
arse...@algonquincollege.commailto:arse...@algonquincollege.com wrote:
I posted this a week ago and no one has responded yet, so I'm trying again:

For input fields where it is not possible (or practical) to implement 
controlled vocabularies or drop down lists, is there a less labour-intensive 
way of preventing data entry errors? For example: The author of several 
documents is ABC Statistics Inc., but each document is added by a different 
ePerson,and each of these people makes a spelling error when filling out the 
AUTHOR field, so these items appear to have different authors. (ABC 
Statisitcs, Inc., ABC Statistics, Inc, ABC Statistics, etc.).

Originally I thought that this would be a minor issue, easily correctable 
through raw SQL queries to update the offending fields. Unfortunately, my 
estimates as to the number of mistakes that would be made has proven to be 
extremely conservative. I do not want to be responsible for correcting so many 
entries myself, nor do I want to reject so many entries asking users to match 
the AUTHOR name that already exists.



Does anyone have any ideas

Re: [Dspace-tech] data entry errors

2012-08-30 Thread helix84
On Thu, Aug 30, 2012 at 3:20 PM, Darren Arsenault
arse...@algonquincollege.com wrote:
 Firstly, I highly doubt that I am the first person to come across this issue. 
 I had hoped that someone had already developed a solution. There are so many 
 different ideas, implementations, configurations, and patches out there that 
 I would be a fool not to ask.

You're right to ask first. It seems like a logical extension of
existing functionality, a feature many people would be interested in.
When you implement it, please make sure to submit your patch to our
Jira [1].

I also wanted to draw you attention to the current development of
Discovery for JSPUI. It's planned to be in DSpace 3.0, which is due
before the end of this year. If I were you, I'd prefer talking to Solr
instead of the database, it's faster and built for search (so you may
forget LIKE). You may want to develop your improvements for the
upcoming version and deploy it when it comes out. You can find the
JSPUI Discovery branch here [2] and watch when it's merged into the
master Git branch here [3]. The corresponding Jira ticket is here [4].

Another option is to use the XMLUI interface where this functionality
already exists for submission (available only for your users
internally) and the JSPUI interface for the public-facing repository
(if you prefer). XMLUI and JSPUI can be deployed just fine in parallel
on one DSpace instance, just on different URLs.

[1] https://jira.duraspace.org/browse/
[2] https://github.com/abollini/DSpace/tree/DS-1217
[3] https://github.com/DSpace/DSpace/pull/60
[4] https://jira.duraspace.org/browse/DS-1217

Regards,
~~helix84

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] data entry errors

2012-08-29 Thread Darren Arsenault
I posted this a week ago and no one has responded yet, so I'm trying again:

For input fields where it is not possible (or practical) to implement 
controlled vocabularies or drop down lists, is there a less labour-intensive 
way of preventing data entry errors? For example: The author of several 
documents is ABC Statistics Inc., but each document is added by a different 
ePerson,and each of these people makes a spelling error when filling out the 
AUTHOR field, so these items appear to have different authors. (ABC 
Statisitcs, Inc., ABC Statistics, Inc, ABC Statistics, etc.).

Originally I thought that this would be a minor issue, easily correctable 
through raw SQL queries to update the offending fields. Unfortunately, my 
estimates as to the number of mistakes that would be made has proven to be 
extremely conservative. I do not want to be responsible for correcting so many 
entries myself, nor do I want to reject so many entries asking users to match 
the AUTHOR name that already exists.



Does anyone have any ideas?



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] data entry errors

2012-08-29 Thread DSpace @ Lyncode
Hi,

i can only think of implementing an Authority Control for that.
Anyway, deposit workflow is meant to accomplish that task (validate/correct
metadata values).

On 29 August 2012 16:22, Darren Arsenault arse...@algonquincollege.comwrote:

 I posted this a week ago and no one has responded yet, so I'm trying again:

 For input fields where it is not possible (or practical) to implement
 controlled vocabularies or drop down lists, is there a less
 labour-intensive way of preventing data entry errors? For example: The
 author of several documents is ABC Statistics Inc., but each document is
 added by a different ePerson,and each of these people makes a spelling
 error when filling out the AUTHOR field, so these items appear to have
 different authors. (ABC Statisitcs, Inc., ABC Statistics, Inc, ABC
 Statistics, etc.).

 Originally I thought that this would be a minor issue, easily correctable
 through raw SQL queries to update the offending fields. Unfortunately, my
 estimates as to the number of mistakes that would be made has proven to be
 extremely conservative. I do not want to be responsible for correcting so
 many entries myself, nor do I want to reject so many entries asking users
 to match the AUTHOR name that already exists.



 Does anyone have any ideas?




 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




-- 
Thanks,
DSpace Department
*Lyncode*: Official
websitehttp://www.google.com/url?q=http%3A%2F%2Fwww.lyncode.com%2Fsa=Dsntz=1usg=AFrqEzdV8iS6rMxflxnn138XReuRfUG3OQ

[image: Follow us on
Facebook]http://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Flyncodesa=Dsntz=1usg=AFrqEzeDuT3ZqMW5uVIA8AoxtTtAeiCX3Q
http://www.google.com/url?q=http%3A%2F%2Fwww.facebook.com%2Flyncodesa=Dsntz=1usg=AFrqEzcWXjHa3gKBGLsNVxktapxkiWDnww
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech