Kuang Chen (http://www.eecs.berkeley.edu/~kuangc/) is a PhD student at
Berkeley. After finishing his Bachelor's at UW-CSE, he has been
focusing on data management systems that help low-resource
organizations and people in the developing world. He works on
improving local practices in data collection and quality and his USHER
(paper,slides) system is a great example of the practical tools he
builds.

Kuang explains in the video below that ?Data quality is a critical
problem in modern databases. Data entry forms present the first and
arguably best opportunity for detecting and mitigating errors, but
there has been little research into automatic methods for improving
data quality at entry time. In this paper, we propose USHER, an
end-to-end system for form design, entry, and data quality assurance.
Using previous form submissions, USHER learns a probabilistic model
over the questions of the form. USHER then applies this model at every
step of the data entry process to improve data quality. Before entry,
it induces a form layout that captures the most important data values
of a form instance as quickly as possible. During entry, it
dynamically adapts the form to the values being entered, and enables
real-time feedback to guide the data enterer toward their intended
values. After entry, it re-asks questions that it deems likely to have
been entered incorrectly. We evaluate all three components of USHER
using two real-world data sets. Our results demonstrate that each
component has the potential to improve data quality considerably, at a
reduced cost when compared to current practice.?

Video: http://www.youtube.com/watch?v=jdl5ECWtHcU
Paper: http://www.eecs.berkeley.edu/~kuangc/publications/icde10-usher.pdf
Slides: 
http://www.eecs.berkeley.edu/~kuangc/publications/icde10-usher.slides.pdf

Reply via email to