Re: [datameet] Data Repository API for Government

2013-07-01 Thread satyaakam goswami
On Mon, Jul 1, 2013 at 12:05 PM, Venkata Pingali ping...@gmail.com wrote:

 I dont have good organized thoughts around data collection and
 accessibility
 (and I dont blog on open data; planning to write on energy data) but it is
 something I woke up to again in the last few months in my business
 conversations. I havent spent as much time in governance/non-profit space.
 Take it FWIW:


Calls for a new thread so starting one with your content as a start  ,
lets talk on there on this topic.

-Satya

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [datameet] Data Repository API for Government

2013-06-30 Thread Gora Mohanty
On 1 July 2013 09:28, Nagarajan M mnagarajan...@gmail.com wrote:
 Hi All,

 The Open Data movement is evolving rapidly with the publication of datasets
 on www.data.gov.in

 The problem with government data is that its more statistics than actionable
 data. However the open data movement will evolve to demand for more granular
 and near real time data.

 The Government system will not be able to provide it because the systems of
 data collection and management are not standardized. There are not enough
 tools that are built to enable the Government machinery to integrate data
 collection through work flow.

Actually, I think that it will *have* to be done through the
Government of India, and the people running data.gov.in
It is just too difficult for third-party developers to do this
in a consistent manner that will keep working in the future:
Please see below.

 I am interested in creating a Data Repository that can act as an API where
 different workflows of government can link up and operate. Use existing
 metadata and also add their own.

 I request your suggestions with regard to architecture, technologies and
 tools.
[...]

Architecture, tools, etc., are not that complicated. One
could start with a system that parses the current spreadsheets,
digests the data, and provides an API. However, there are
several issues with third parties doing this:
1. While several people have done specific case studies by
manually extracting the data from the spreadsheet files,
this is obviously not something that can scale.
2. It is possible to build a system that enlists the help of people
interested in specific datasets. For example, a prototype of
such a system would parse the first few lines of any given
spreadsheet file, make an educated guess about data types,
and allow the user to modify the guesses.
3. Such crowd-sourcing might work at one level, but a big
problem is that:
(a) There is no consistent standard followed by the spreadsheets.
 E.g., at least names of states, and date formats could
 be standardised.
(b) Several of the spreadsheets that we looked at have
 internally inconsistent data. Please see the issues
 brought up by Supreet at
 http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg29943.html
4. Finally, there is the question of what incentive there is for
third-party developers to build such an API?
   For sustainability, this either has to be done by a publicly-
funded institution or there has to be a commercial basis for it.
If it is done through public funds, why should data.gov.in not
handle it themselves?
   There might well be a business case in selling access to such
an API, but the cost of developing the API and hosting the
services will be quite high. However, I cannot seem to find any
indication on the data.gov.in sites as to whether such third-
party, commercial access is allowed. In fact, I could not find
any information on the terms of usage of the data published
there.

Regards,
Gora

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [datameet] Data Repository API for Government

2013-06-30 Thread Venkata Pingali
I broadly agree. Technology is the easy part. I would think in terms
of architecture of coordination. Let me comment on one related
aspect.

Finally, there is the question of what incentive there is for
third-party developers to build such an API?

Surprisingly enough, it doesnt have to be very high. The process has to be
efficient. If it requires too much of mental and process context switch, it
gets difficult. People like me are not in school anymore.

-Venkata





On Mon, Jul 1, 2013 at 9:50 AM, Gora Mohanty g...@mimirtech.com wrote:

 On 1 July 2013 09:28, Nagarajan M mnagarajan...@gmail.com wrote:
  Hi All,
 
  The Open Data movement is evolving rapidly with the publication of
 datasets
  on www.data.gov.in
 
  The problem with government data is that its more statistics than
 actionable
  data. However the open data movement will evolve to demand for more
 granular
  and near real time data.
 
  The Government system will not be able to provide it because the systems
 of
  data collection and management are not standardized. There are not enough
  tools that are built to enable the Government machinery to integrate data
  collection through work flow.

 Actually, I think that it will *have* to be done through the
 Government of India, and the people running data.gov.in
 It is just too difficult for third-party developers to do this
 in a consistent manner that will keep working in the future:
 Please see below.

  I am interested in creating a Data Repository that can act as an API
 where
  different workflows of government can link up and operate. Use existing
  metadata and also add their own.
 
  I request your suggestions with regard to architecture, technologies and
  tools.
 [...]

 Architecture, tools, etc., are not that complicated. One
 could start with a system that parses the current spreadsheets,
 digests the data, and provides an API. However, there are
 several issues with third parties doing this:
 1. While several people have done specific case studies by
 manually extracting the data from the spreadsheet files,
 this is obviously not something that can scale.
 2. It is possible to build a system that enlists the help of people
 interested in specific datasets. For example, a prototype of
 such a system would parse the first few lines of any given
 spreadsheet file, make an educated guess about data types,
 and allow the user to modify the guesses.
 3. Such crowd-sourcing might work at one level, but a big
 problem is that:
 (a) There is no consistent standard followed by the spreadsheets.
  E.g., at least names of states, and date formats could
  be standardised.
 (b) Several of the spreadsheets that we looked at have
  internally inconsistent data. Please see the issues
  brought up by Supreet at

 http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg29943.html
 4. Finally, there is the question of what incentive there is for
 third-party developers to build such an API?
For sustainability, this either has to be done by a publicly-
 funded institution or there has to be a commercial basis for it.
 If it is done through public funds, why should data.gov.in not
 handle it themselves?
There might well be a business case in selling access to such
 an API, but the cost of developing the API and hosting the
 services will be quite high. However, I cannot seem to find any
 indication on the data.gov.in sites as to whether such third-
 party, commercial access is allowed. In fact, I could not find
 any information on the terms of usage of the data published
 there.

 Regards,
 Gora

 --
 For more details about this list
 http://datameet.org/discussions/
 ---
 You received this message because you are subscribed to the Google Groups
 datameet group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to datameet+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [datameet] Data Repository API for Government

2013-06-30 Thread satyaakam goswami
On Mon, Jul 1, 2013 at 9:28 AM, Nagarajan M mnagarajan...@gmail.com wrote:

 Hi All,

 The Open Data movement is evolving rapidly with the publication of
 datasets on www.data.gov.in

 The problem with government data is that its more statistics than
 actionable data. However the open data movement will evolve to demand for
 more granular and near real time data.


agreed 


 The Government system will not be able to provide it because the systems
 of data collection and management are not standardized. There are not
 enough tools that are built to enable the Government machinery to integrate
 data collection through work flow.


no idea what you meant by these  sweeping statement.

I am interested in creating a Data Repository that can act as an API where
 different workflows of government can link up and operate. Use existing
 metadata and also add their own.

 I request your suggestions with regard to architecture, technologies and
 tools.


Would love to have an understanding about what your expectation is  ,lets
start talking , to begin with i have marked Mr Dp Misra from Data Portal
team to this thread who will be the person to initiate  this conversation.

thanks
-Satya

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [datameet] Data Repository API for Government

2013-06-30 Thread Gora Mohanty
On 1 July 2013 10:16, Venkata Pingali ping...@gmail.com wrote:
 I broadly agree. Technology is the easy part. I would think in terms
 of architecture of coordination. Let me comment on one related
 aspect.

 Finally, there is the question of what incentive there is for
 third-party developers to build such an API?

 Surprisingly enough, it doesnt have to be very high. The process has to be
 efficient. If it requires too much of mental and process context switch, it
 gets difficult. People like me are not in school anymore.
[...]

Not sure what you mean by this response. The process
of what?

A third possibility exists besides the two that I mentioned
in my earlier message: That of having an open-source
group that does this, where the incentive for doing the
development for free is recognition. However, the hosting
costs will still be high. My opinion also is that given the
current strength and abilities of open-source communities
in India, even development in this manner will be very
difficult.

Regards,
Gora

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [datameet] Data Repository API for Government

2013-06-30 Thread Gora Mohanty
On 1 July 2013 10:28, Venkata Pingali ping...@gmail.com wrote:
 I meant coordination process. Many tasks involving data (esp collection)
 tend to be time consuming. I did a bunch of that in my previous life
 (lots of calls, emails, travel, campaigns etc.). It is hard to do it now.

Agreed. As you mentioned, and as I also believe, the technology
is the relatively easy part.

 BTW, I didnt imply that there needs to be monetary incentive, even in
 industry. I believe that open data is a reward in itself.

Well, here we will have to agree to disagree.

IMHO, this particular problem is of a scale and complexity that
cannot easily be addressed in a voluntary context. One might be
able to build a prototype, or even a working system, using open-
source volunteers working free of cost (personally, I think that
even this will be very difficult given the current status in India).
However, sustaining it is not something that can be done without
significant funding. Just the hosting costs will run into at least a
couple lakhs/year at any reasonable scale.

Regards,
Gora

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
datameet group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.