Re: [datameet] Data Repository API for Government
On Mon, Jul 1, 2013 at 12:05 PM, Venkata Pingali ping...@gmail.com wrote: I dont have good organized thoughts around data collection and accessibility (and I dont blog on open data; planning to write on energy data) but it is something I woke up to again in the last few months in my business conversations. I havent spent as much time in governance/non-profit space. Take it FWIW: Calls for a new thread so starting one with your content as a start , lets talk on there on this topic. -Satya -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [datameet] Data Repository API for Government
On 1 July 2013 09:28, Nagarajan M mnagarajan...@gmail.com wrote: Hi All, The Open Data movement is evolving rapidly with the publication of datasets on www.data.gov.in The problem with government data is that its more statistics than actionable data. However the open data movement will evolve to demand for more granular and near real time data. The Government system will not be able to provide it because the systems of data collection and management are not standardized. There are not enough tools that are built to enable the Government machinery to integrate data collection through work flow. Actually, I think that it will *have* to be done through the Government of India, and the people running data.gov.in It is just too difficult for third-party developers to do this in a consistent manner that will keep working in the future: Please see below. I am interested in creating a Data Repository that can act as an API where different workflows of government can link up and operate. Use existing metadata and also add their own. I request your suggestions with regard to architecture, technologies and tools. [...] Architecture, tools, etc., are not that complicated. One could start with a system that parses the current spreadsheets, digests the data, and provides an API. However, there are several issues with third parties doing this: 1. While several people have done specific case studies by manually extracting the data from the spreadsheet files, this is obviously not something that can scale. 2. It is possible to build a system that enlists the help of people interested in specific datasets. For example, a prototype of such a system would parse the first few lines of any given spreadsheet file, make an educated guess about data types, and allow the user to modify the guesses. 3. Such crowd-sourcing might work at one level, but a big problem is that: (a) There is no consistent standard followed by the spreadsheets. E.g., at least names of states, and date formats could be standardised. (b) Several of the spreadsheets that we looked at have internally inconsistent data. Please see the issues brought up by Supreet at http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg29943.html 4. Finally, there is the question of what incentive there is for third-party developers to build such an API? For sustainability, this either has to be done by a publicly- funded institution or there has to be a commercial basis for it. If it is done through public funds, why should data.gov.in not handle it themselves? There might well be a business case in selling access to such an API, but the cost of developing the API and hosting the services will be quite high. However, I cannot seem to find any indication on the data.gov.in sites as to whether such third- party, commercial access is allowed. In fact, I could not find any information on the terms of usage of the data published there. Regards, Gora -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [datameet] Data Repository API for Government
I broadly agree. Technology is the easy part. I would think in terms of architecture of coordination. Let me comment on one related aspect. Finally, there is the question of what incentive there is for third-party developers to build such an API? Surprisingly enough, it doesnt have to be very high. The process has to be efficient. If it requires too much of mental and process context switch, it gets difficult. People like me are not in school anymore. -Venkata On Mon, Jul 1, 2013 at 9:50 AM, Gora Mohanty g...@mimirtech.com wrote: On 1 July 2013 09:28, Nagarajan M mnagarajan...@gmail.com wrote: Hi All, The Open Data movement is evolving rapidly with the publication of datasets on www.data.gov.in The problem with government data is that its more statistics than actionable data. However the open data movement will evolve to demand for more granular and near real time data. The Government system will not be able to provide it because the systems of data collection and management are not standardized. There are not enough tools that are built to enable the Government machinery to integrate data collection through work flow. Actually, I think that it will *have* to be done through the Government of India, and the people running data.gov.in It is just too difficult for third-party developers to do this in a consistent manner that will keep working in the future: Please see below. I am interested in creating a Data Repository that can act as an API where different workflows of government can link up and operate. Use existing metadata and also add their own. I request your suggestions with regard to architecture, technologies and tools. [...] Architecture, tools, etc., are not that complicated. One could start with a system that parses the current spreadsheets, digests the data, and provides an API. However, there are several issues with third parties doing this: 1. While several people have done specific case studies by manually extracting the data from the spreadsheet files, this is obviously not something that can scale. 2. It is possible to build a system that enlists the help of people interested in specific datasets. For example, a prototype of such a system would parse the first few lines of any given spreadsheet file, make an educated guess about data types, and allow the user to modify the guesses. 3. Such crowd-sourcing might work at one level, but a big problem is that: (a) There is no consistent standard followed by the spreadsheets. E.g., at least names of states, and date formats could be standardised. (b) Several of the spreadsheets that we looked at have internally inconsistent data. Please see the issues brought up by Supreet at http://www.mail-archive.com/ilugd@lists.linux-delhi.org/msg29943.html 4. Finally, there is the question of what incentive there is for third-party developers to build such an API? For sustainability, this either has to be done by a publicly- funded institution or there has to be a commercial basis for it. If it is done through public funds, why should data.gov.in not handle it themselves? There might well be a business case in selling access to such an API, but the cost of developing the API and hosting the services will be quite high. However, I cannot seem to find any indication on the data.gov.in sites as to whether such third- party, commercial access is allowed. In fact, I could not find any information on the terms of usage of the data published there. Regards, Gora -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [datameet] Data Repository API for Government
On Mon, Jul 1, 2013 at 9:28 AM, Nagarajan M mnagarajan...@gmail.com wrote: Hi All, The Open Data movement is evolving rapidly with the publication of datasets on www.data.gov.in The problem with government data is that its more statistics than actionable data. However the open data movement will evolve to demand for more granular and near real time data. agreed The Government system will not be able to provide it because the systems of data collection and management are not standardized. There are not enough tools that are built to enable the Government machinery to integrate data collection through work flow. no idea what you meant by these sweeping statement. I am interested in creating a Data Repository that can act as an API where different workflows of government can link up and operate. Use existing metadata and also add their own. I request your suggestions with regard to architecture, technologies and tools. Would love to have an understanding about what your expectation is ,lets start talking , to begin with i have marked Mr Dp Misra from Data Portal team to this thread who will be the person to initiate this conversation. thanks -Satya -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [datameet] Data Repository API for Government
On 1 July 2013 10:16, Venkata Pingali ping...@gmail.com wrote: I broadly agree. Technology is the easy part. I would think in terms of architecture of coordination. Let me comment on one related aspect. Finally, there is the question of what incentive there is for third-party developers to build such an API? Surprisingly enough, it doesnt have to be very high. The process has to be efficient. If it requires too much of mental and process context switch, it gets difficult. People like me are not in school anymore. [...] Not sure what you mean by this response. The process of what? A third possibility exists besides the two that I mentioned in my earlier message: That of having an open-source group that does this, where the incentive for doing the development for free is recognition. However, the hosting costs will still be high. My opinion also is that given the current strength and abilities of open-source communities in India, even development in this manner will be very difficult. Regards, Gora -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [datameet] Data Repository API for Government
On 1 July 2013 10:28, Venkata Pingali ping...@gmail.com wrote: I meant coordination process. Many tasks involving data (esp collection) tend to be time consuming. I did a bunch of that in my previous life (lots of calls, emails, travel, campaigns etc.). It is hard to do it now. Agreed. As you mentioned, and as I also believe, the technology is the relatively easy part. BTW, I didnt imply that there needs to be monetary incentive, even in industry. I believe that open data is a reward in itself. Well, here we will have to agree to disagree. IMHO, this particular problem is of a scale and complexity that cannot easily be addressed in a voluntary context. One might be able to build a prototype, or even a working system, using open- source volunteers working free of cost (personally, I think that even this will be very difficult given the current status in India). However, sustaining it is not something that can be done without significant funding. Just the hosting costs will run into at least a couple lakhs/year at any reasonable scale. Regards, Gora -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups datameet group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.