Re: [datameet] Phonetic Similarity

2018-08-13 Thread Venkata Pingali
Soundex is not enough. We went through metaphone and double-metaphone as well. The last showed the best performance when combined with simple ways to reduce the search space (e.g., names that start with the same alphabet). But it still had too many false positives and negatives. We ended up using

Re: [datameet] FSSAI/Other food database

2017-08-31 Thread Venkata Pingali
look at : https://world.openfoodfacts.org/ . > > -Konar > > On Thu, Aug 31, 2017 at 5:16 AM, Venkata Pingali <ping...@gmail.com> > wrote: > >> Hi! >> >> Is anyone aware of any public datasets on groceries and what >> they contain? >> >> thanks! >> -Venka

[datameet] FSSAI/Other food database

2017-08-30 Thread Venkata Pingali
Hi! Is anyone aware of any public datasets on groceries and what they contain? thanks! -Venkata -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups

Re: [datameet] Re: Data to understand voting pattern

2017-02-15 Thread Venkata Pingali
estment | United Way Mumbai > > “Everyone You Meet has something Valuable to Teach You”. > > ------- > > On Wed, Feb 15, 2017 at 2:49 PM, Venkata Pingali <ping...@gmail.com> > wrote: > >> Usually

Re: [datameet] Re: Data to understand voting pattern

2017-02-15 Thread Venkata Pingali
Usually the state election commission's web site has basic data (winner, loser, votes). Here is one for Maharashtra: https://mahasec.maharashtra.gov.in/Site/Home/Index.aspx On Wed, Feb 15, 2017 at 2:25 PM, Kedar Annam wrote: > Hey Pratap, > > Thankyou very much for

Re: [datameet] Re: Assembly constituency KML and JSON, from the Election Commission

2017-01-21 Thread Venkata Pingali
The electoral list PDF usually has a hand drawn map. But it is not worth the effort (i was involved in such an effort in the past). The boundaries of the booth change every six months and that too in a random fashion. Further the map is often out of sync with the actual content of the voterlist.

Re: [datameet] Help with schools location data extracting

2016-05-21 Thread Venkata Pingali
I looked at this site some months back. Notes from my investigation then: 1. The content/jsons are being served by commercial GIS operated by BSNL 2. The content is coming back split across many JSONs (50-60) with encoded URLs 3. API appeared stateful (URLs kept changing) This, I concluded,

[datameet] Re: dgit - git for datasets - alpha release

2016-04-13 Thread Venkata Pingali
, 2016 at 1:30 PM, Venkata Pingali <ping...@gmail.com> wrote: > Hi! > > I have been working on an opensource project to manage > datasets called dgit. It has reached alpha stage. See the text > below for details. > > dgit's goal is to enable more structured and predict

[datameet] dgit - git for datasets - alpha release

2016-04-05 Thread Venkata Pingali
Hi! I have been working on an opensource project to manage datasets called dgit. It has reached alpha stage. See the text below for details. dgit's goal is to enable more structured and predictable data science process where you are able to answer questions like: (a) Lineage/Auditability: Where

Re: [datameet] Comparison of wards for 2001 and 2011 census

2015-05-14 Thread Venkata Pingali
At my firm (fourthlion) we did a bit of that - discovering the boundaries of the ward/other geography and trace population growth - using the voter list for the 2009-2014 time period and for one constituency. It is a cumbersome process for multiple technical reasons including free text fields,

Re: [datameet] Any privacy issue in publishing names of voters?

2015-03-01 Thread Venkata Pingali
Couple of thoughts based on my experience with voter lists: 1. The value of the names (for profiling population clusters) increase with granularity. Would recommend sharing at state level. You could potentially annotate with variables (e.g., abstract region - north/south) to make it little more

Re: [datameet] [Article] Limitations of the PDF

2014-05-31 Thread Venkata Pingali
, 2014 at 2:24 PM, Venkata Pingali ping...@gmail.com wrote: I have been working on PDF extraction. I find that PDF combines 'what' (text itself) with 'how' (transformations, presentation). The table that we see if often just a collection of lines and rectangles put together in an adhoc fashion

Re: [datameet] [Article] Limitations of the PDF

2014-05-12 Thread Venkata Pingali
I have been working on PDF extraction. I find that PDF combines 'what' (text itself) with 'how' (transformations, presentation). The table that we see if often just a collection of lines and rectangles put together in an adhoc fashion. It could be due to pdf generator libraries themselves. It

Re: [datameet] What kinds of things could we do with Daksh 2014 MP Constituency data?

2014-02-10 Thread Venkata Pingali
and will be completed in two weeks or so. So we can't change the questionnaire now, unfortunately. My intent here is to get help in figuring out how the data we collect can be used. Thanks! Kishore. --- www.dakshindia.org On Mon, Feb 10, 2014 at 12:35 PM, Venkata Pingali ping

Re: [datameet] What kinds of things could we do with Daksh 2014 MP Constituency data?

2014-02-09 Thread Venkata Pingali
Couple of thoughts: 1. We can ask about origin/hometown. This can help us understand urbanization process. 2. Will the data be public information? Who is the sponsor of this initiative? On Mon, Feb 10, 2014 at 11:06 AM, Kishore (Narasimhan) Mandyam kish...@dakshindia.org wrote: Daksh is

Re: [datameet] What kinds of things could we do with Daksh 2014 MP Constituency data?

2014-02-09 Thread Venkata Pingali
will be published beginning two weeks from now. On Monday, February 10, 2014 12:14:35 PM UTC+5:30, Venkata Pingali wrote: Couple of thoughts: 1. We can ask about origin/hometown. This can help us understand urbanization process. 2. Will the data be public information? Who is the sponsor

[datameet] newspaper archive

2013-12-22 Thread Venkata Pingali
Hi! I am looking for archives for the last 20 years of major urban newspapers such as Times of India. Are you aware of any archives (free or paid)? I browsed through online archives where available such as Hindu, ToI but need them in a more accessible form e.g., API. thanks! -Venkata -- For

Re: [datameet] Power Tariff Data

2013-11-28 Thread Venkata Pingali
Two sources: 1. Shunglu and Chaturvedi committee reports on planning commission site. 2. State and central electricity regulatory commissions (CERC for central level, MERC for Maharashtra etc) -Venkata On Thu, Nov 28, 2013 at 4:59 PM, Naveen Gattu naveen.ga...@gramener.comwrote: HI -

Re: [datameet] Data Repository API for Government

2013-06-30 Thread Venkata Pingali
I broadly agree. Technology is the easy part. I would think in terms of architecture of coordination. Let me comment on one related aspect. Finally, there is the question of what incentive there is for third-party developers to build such an API? Surprisingly enough, it doesnt have to be

[datameet] Minister for Water Resources Releases Atlas on Aquifer systems of India

2012-09-28 Thread Venkata Pingali
Today's press release. http://pib.nic.in/newsite/PrintRelease.aspx?relid=88037 Minister of Water Resources and Parliamentary Affairs, Shri Pawan Kumar Bansal released Atlas for six states viz Kerala, Tamil Nadu, Karnataka, Chhattisgarh, Himachal Pradesh and Meghalaya in New Delhi today.