Raphael, you raise very pertinent issues.

We as a community love open data and in this country there is a lot that
can be done to free all kinds of data so that it can be made use of in a
good way (election data in an aggregated form is one example). But at the
same time there are certain kinds of data which are not open ( i mean not
open in a machine readable format) for a good reason. I believe voter rolls
data is one such type. In the past voter lists have been used to pinpoint
members of specific communities which were then targeted with gruesome
effect. Shudder to think what happens if it is automated, a 'riot app'?

As Raphael points out this is not just about privacy, but could be much
worse.

This group is a fantastic initiative and as it evolves, it would be great
for us to involve more social scientists and policy experts - so as we
advocate vociferously to free more data and make it open - we can also
bring in the technical expertise here to recommend where data needs to be
better protected and how.

cs


On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind <
li...@raphael-susewind.de> wrote:

> Hi Devdatta and Avinash,
>
> yes, I, too, am frankly surprised at the ease with which one can access
> sensitive data in bulk. Not only PDF rolls and voter details, but also
> things such as land records, BPL lists, and much more - I think we are
> in an exciting as well as dangerous phase of fairly uncontrolled,
> nascent e-Governance practices. But I think the ethical issues here are
> a little more complex than mere privacy concern.
>
> Upfront, I must admit that I use all the above sources for academic
> research (in UP and across India). What Avinash described in principle
> and at the example of Delhi can indeed be done on an all-India scale,
> and I am sure there are more people than just me who do it.
>
> But then the social sciences have long dealt with sensitive data and
> developed protocols to protect it. Even though the data is publicly
> available, I for instance have my own copy on a secure workstation with
> full disk encryption and two factor authentication. Whenever possible, I
> also work on anonymized subsets of data. Yet there are other potential
> uses - some of the more worrisome you pointed out - which are not bound
> by such data protection standards.
>
> To me, this once more highlights the nascent stage of ethical standards
> around Big Data and eGovernance. On the plus side, I am happy to have
> that kind of access to conduct research which will ultimately be
> ethically beneficial, leading to better understanding of social issues
> and potentially to better policy advice. Also, there is a point to be
> made that transparency is an important asset in elections in particular,
> not only in terms of individual electoral search functions, but also in
> terms of publicly accessible (and cross-checkable, publicly verifiable)
> PDF rolls. Finally, a lot of this data had been available in the past as
> well, only in distributed and/or commercial form, which means there had
> been a hierarchy of access: small-time crooks could not use it, but
> large-time crooks were always able to use it. Likewise, scholars at
> large (often foreign) universities were able to use it, but not smaller
> ones (this is still true for some data, geodata in particular, which I
> can only access because of Ivy-League contacts and only process because
> of an association with Oxford University).
>
> The ethical challenge as I see it thus comes not from data availability
> per se, but from the bulk accessibility and processability of data, as
> well as the potential to link otherwise disconnected datasets with each
> other (for instance a voter ID from the rolls to the online electoral
> search mechanism to that voter's polling booth locality to the ration
> card of a person with the same name registered at a ration shop in close
> spatial proximity to the amount of rice that person obtained last week,
> all coupled - in case of my own research - to that person's religious
> identity through a namematching algorithm). And this IS an ethical
> challenge indeed, particularly if one leaves the ivory tower of
> academia, where ethical standards for such data are more ingrained, and
> more adhered to. One need not go all the way to the various criminal
> uses of such data - are we all happy with commercial use, to start with?
>
> I have no easy answers here, because I think the ethical issue is fairly
> complex, balancing privacy and personal security against transparency in
> the political process and legitimate academic use of data (also because
> I think the answer must be found in India through political
> deliberation, and not in German academia). Still, in the end, I have to
> admit that I often leave my desk in the evening with quite some unease
> over the sheer wealth of private data that I work with...
>
> What do others think?
> Raphael
>
> On 11.04.2014 06:57, Avinash Celestine wrote:
> > Hi Devdatta
> >
> > Yes, though (and in the current context, i suppose thats a good thing),
> > its not so easy for some other states such as UP, due to certain
> > problems with the way the pdfs are encoded. Raphael, who is on this
> > group, will testify to that...
> >
> > I had alluded to this sometime back...
> >
> > https://storify.com/ac_soc/voter-profiling
> >
> > Avinash
> >
> >
> >
> >
> > On Fri, Apr 11, 2014 at 9:55 AM, Devdatta Tengshe <devda...@tengshe.in
> > <mailto:devda...@tengshe.in>> wrote:
> >
> >     Hi,
> >     I found this interesting article by a guy who downloaded and
> >     processed the Voter list of Delhi:https://medium.com/p/1aff55526881
> >     <https://medium.com/p/1aff55526881>
> >
> >     I found this via a discussion on Reddit:
> >
> http://www.reddit.com/r/programming/comments/22pn8u/i_wrote_a_few_simple_python_scripts_to_retrieve/
> >
> >     I'll like to quote his findings here:
> >
> >      1. It is possible to automate the retrieval of every single PDF
> >         roll all across India
> >      2. These PDFs can then be processed in a matter of minutes to
> >         produce details like Addresses, names, father's name, gender,
> >         age and voters ID number for every single registered voter of
> India
> >      3. Nearly 25% of the Voter IDs assigned within only Delhi fail to
> >         conform to the government format, and fail the Luhn Checksum
> >         test used to validate them. It is likely that other states are
> >         in a similar, if not worse condition
> >
> >
> >     Regards,
> >
> >     Devdatta Tengshe
> >
> >
> >     --
> >     For more details about this list
> >     http://datameet.org/discussions/
> >     ---
> >     You received this message because you are subscribed to the Google
> >     Groups "datameet" group.
> >     To unsubscribe from this group and stop receiving emails from it,
> >     send an email to datameet+unsubscr...@googlegroups.com
> >     <mailto:datameet+unsubscr...@googlegroups.com>.
> >     For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > For more details about this list
> > http://datameet.org/discussions/
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to datameet+unsubscr...@googlegroups.com
> > <mailto:datameet+unsubscr...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>       Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>    Papers & Blog | http://www.raphael-susewind.de
>
> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
>
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to