On 1 July 2013 09:28, Nagarajan M <[email protected]> wrote:
> Hi All,
>
> The Open Data movement is evolving rapidly with the publication of datasets
> on www.data.gov.in
>
> The problem with government data is that its more statistics than actionable
> data. However the open data movement will evolve to demand for more granular
> and near real time data.
>
> The Government system will not be able to provide it because the systems of
> data collection and management are not standardized. There are not enough
> tools that are built to enable the Government machinery to integrate data
> collection through work flow.
Actually, I think that it will *have* to be done through the
Government of India, and the people running data.gov.in
It is just too difficult for third-party developers to do this
in a consistent manner that will keep working in the future:
Please see below.
> I am interested in creating a Data Repository that can act as an API where
> different workflows of government can link up and operate. Use existing
> metadata and also add their own.
>
> I request your suggestions with regard to architecture, technologies and
> tools.
[...]
Architecture, tools, etc., are not that complicated. One
could start with a system that parses the current spreadsheets,
digests the data, and provides an API. However, there are
several issues with third parties doing this:
1. While several people have done specific case studies by
manually extracting the data from the spreadsheet files,
this is obviously not something that can scale.
2. It is possible to build a system that enlists the help of people
interested in specific datasets. For example, a prototype of
such a system would parse the first few lines of any given
spreadsheet file, make an educated guess about data types,
and allow the user to modify the guesses.
3. Such crowd-sourcing might work at one level, but a big
problem is that:
(a) There is no consistent standard followed by the spreadsheets.
E.g., at least names of states, and date formats could
be standardised.
(b) Several of the spreadsheets that we looked at have
internally inconsistent data. Please see the issues
brought up by Supreet at
http://www.mail-archive.com/[email protected]/msg29943.html
4. Finally, there is the question of what incentive there is for
third-party developers to build such an API?
For sustainability, this either has to be done by a publicly-
funded institution or there has to be a commercial basis for it.
If it is done through public funds, why should data.gov.in not
handle it themselves?
There might well be a business case in selling access to such
an API, but the cost of developing the API and hosting the
services will be quite high. However, I cannot seem to find any
indication on the data.gov.in sites as to whether such third-
party, commercial access is allowed. In fact, I could not find
any information on the terms of usage of the data published
there.
Regards,
Gora
--
For more details about this list
http://datameet.org/discussions/
---
You received this message because you are subscribed to the Google Groups
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.