Hello Group, I have written an article in 'the Ken' on the state of Open 
data in the country and how government is restricting access to the most 
important datasets of the country. Please give your feedback on this 
article - also are there other datasets which are very important for the 
economy but are behind restricted access. Please let me know of such 
datasets.. we will try to highlight them in media and to the government. 

Here is the link of my article : 
https://the-ken.com/choking-innovation-pipe/

Know Your Defaulter, or KYD, launched in mid-2016. The startup’s software 
spiders crawled through dozens and hundreds of data sources each day – 
court cases, company filings, credit ratings and defaulter lists put out by 
banks and financial institutions. It allowed anyone to instantly conduct an 
online due diligence on a company before entering into any sort of 
agreement with it. In less than a year, KYD had saved tens, possibly 
hundreds of millions of dollars for its users by helping them spot 
companies that were unscrupulous.


Best Trip, a trip planning app that showed real-time road traffic 
conditions in the top six Indian metro cities was credited with saving 
millions of commuter hours, vehicle fuel and generating hundreds of 
millions of dollars in productivity and healthcare costs savings in just 
the 17 months since its launch in late 2015.

There’s also the story of Compliance Scanner which started in 2014 as Data 
Watch. After the first year of meandering, trying to find its business 
purpose, it pivoted into a useful tool that allowed anyone to instantly 
identify the statutory compliance status of Indian firms on various laws. 
Regulators, lawyers, consultants, civic-minded citizens and NGOs; all 
relied on it to spot variances between what a company claimed in public, 
and what it practiced in private. It was single handedly responsible for 
dozens of well-known companies being found of falling short of their 
provident fund (EPFO) obligations.


This is where we must apologize and tell you that none of the above three 
examples are true.

Know Your Defaulter, Best Trip and Compliance Scanner do not exist.


In fact, these startups cannot exist (and neither can the efficiency they 
generate) because of the way various Indian governments and regulators have 
either ignored or actively stymied the “Open Data” initiative 
<https://community.data.gov.in/right-to-information-act-open-government-data/>
.

What is open data?

“Open data and content can be freely used, modified, and shared by anyone 
for any purpose.”
Data = Innovation Fuel

A 2013 McKinsey research report showed how governments around the world 
could unlock an additional $3 trillion (yes, that’s right) in economic 
value merely by enabling open data across seven domains.


To quote from the report: “An estimated $3 trillion in annual economic 
potential could be unlocked across seven domains. These benefits include 
increased efficiency, development of new products and services, and 
consumer surplus (cost savings, convenience, better-quality products). We 
consider societal benefits, but these are not quantified. For example, we 
estimate the economic impact of improved education (higher wages), but not 
the benefits that society derives from having well-educated citizens. We 
estimate that the potential value would be divided roughly between the 
United States ($1.1 trillion), Europe ($900 billion) and the rest of the 
world ($1.7 trillion).”


The seven domains McKinsey identified were education, transportation, 
consumer products, electricity, oil & gas, healthcare and consumer finance.

Needless to say, these are some of the sectors where India needs innovation 
and transformation at scale. Today. If only.

Even though there were early indicators that the Indian government was 
interested in furthering open data 
<https://community.data.gov.in/right-to-information-act-open-government-data/> 
and 
transparency initiatives, the last few years have been a big let-down. 
After finding the Right to Information (RTI) Act “somewhat wanting”, the 
current government laid special emphasis 
<https://community.data.gov.in/right-to-information-act-open-government-data/> 
on 
the need to release raw data in a machine-readable format. The Open 
Government Data platform data.gov.in <http://data.gov.in/> was supposed to 
do exactly that. Even before the current government, back in 2011, India 
was also a formative member of the Open Government Partnership 
<http://www.opengovpartnership.org/> (which has since been joined by 75 
countries) but withdrew just before the kickoff 
<http://www.freedominfo.org/2011/07/india-withdraws-from-open-government-partnership/>
. With the benefit of hindsight, these pronouncements seem to be 
pretentious at best.


Higher value data sets still remain behind access walls. In fact, in some 
cases, access has been made more difficult since the government started 
driving its open data movement. Much of this data was provided for public 
access earlier but increasingly, access has been restricted. The government 
is opening up less useful data sets while blocking access to richer ones. 
This is entirely contrary to the open data policy of the government which 
says that data collected by the government with the public money shall be 
in the open.
Data death by a thousand CAPTCHAS

Here, an important question must be asked. What does it mean for data to be 
truly open? In a way that it acts as an innovation multiplier for a 
country’s economy? Here’s how McKinsey described it:
<https://the-ken.com/choking-innovation-pipe/#>

McKinsey (How Government Can Promote Open Data And Unleash Over $3 Trillion 
In Economic Value)

For any activity that needs to be done at scale and with few errors and at 
speed, machine readability is crucial. Scanned documents or badly formatted 
PDFs force diligence and other processes into manual mode which are slow, 
error-prone and costly – and sometimes altogether impossible to do manually.

Diligence (both at individual level as well as sector level) is 
increasingly an on-going activity rather than a one-time, at-initiation 
activity. Without crawlability, on-going diligence might as well be written 
off.

Unfortunately in India, we’re on the “closed” end of the spectrum on almost 
all counts. CAPTCHAS, paywalls and secretiveness are increasingly what we 
encounter while collating most government data.


Case statuses and details in district courts were originally crawlable 
(meaning, could be read and indexed by software spiders) but have now been 
put behind CAPTCHAs. This means a human intervention is required for 
downloading every single unit of data. An argument is made that if you need 
to search for a case, you can directly put in a CAPTCHA on the government 
website and search there. However, unless service providers can 
pre-download all of the data, various matching and tracking algorithms 
cannot be run. Further, unless these databases can be continuously updated, 
early warning systems (of the type recommended by the RBI for instance) 
cannot possibly be built.


All courts in India put out “cause lists” – or lists of cases scheduled for 
hearings. These are important lists for a variety of stakeholders – from 
lawyers to lenders to parties actually involved in the suits. However, 
these lists are rarely machine readable or even searchable (often being 
scanned copies). The system seems to expect all interested stakeholders to 
manually download these lists daily and go through them to see if there is 
anything that affects them. The scale of this task seems to elude the 
authorities.

For instance, a bank like State Bank of India (SBI) has tens of thousands 
of corporate borrowers. What mechanism are they expected to have to 
manually look through these lists for potential interests?

It will be fair to say that today, there is no systematic mechanism for 
accessing this information on a large scale, and it is by design.


The state-run Provident Fund Organization (EPFO) used to have a system to 
search for all payers in the EPFO system. This was routinely used to 
estimate the size of an organization and to determine if it was complying 
with payment of EPFO dues etc. This too has now been put behind a CAPTCHA. 
If a bank or a manufacturer wants to keep track of its borrower/vendor for 
compliance on this front, it is close to impossible now.

A lot of private company information, like CIN (Corporate Identification 
Number) changes, are either behind CAPTCHAs or available only on paper 
(because the services are broken).


The same is the case with trademark information and filings related to 
them. This means it’s impossible to create a comprehensive database of 
trademarks in India. In 2017.

Compare this to the US where the government proactively pushes out patents 
and trademarks data freely in association with private companies.

Information related to consignment-wise exports and imports used to be open 
for years, but was suddenly taken offline in December 2016. It was brought 
back online, but at a charge of Re 1 per record (that’s almost Rs 2 lakh 
per day if you want all the records).


Then there’s information related to the Profit & Loss statements of 
companies. As per the Companies Act, 2013, P&L statements of companies are 
no longer private. And for a few months following that, old P&L statements 
had been made available on MCA website after payment of a requisite fee. 
But now, all of a sudden, that too has been withdrawn. Compare this to 
Companies House in the UK where such documents are available free of charge.


Even publicly listed companies have been rendered partly opaque. In the US, 
the SEC actively (even aggressively) disseminates filings by public 
entities, relating not only to their financials but also all material 
events. But in India our securities regulator SEBI has abdicated this 
responsibility in favour of respective exchanges. The exchanges in turn 
hoard quarterly filings in the XBRL format (this is a structured format in 
which companies report quarterly data) and make corporate announcements 
available for retail use but for commercial use require a paid subscription.

Do note, these are “public filings”. It is incredible that SEBI has allowed 
private monopolies to exist and extort the public for what is rightfully 
theirs to begin with.

This list is endless.

Even papers laid on the table of Parliament, those are technically public 
documents whose titles are made available on the Lok Sabha’s website, 
continue to be unavailable. Presumably, most if not all of these documents 
are now prepared digitally so there can’t be much friction in making them 
available online.

Given the daily news coverage around bank defaults, one would think at 
least the Reserve Bank of India (RBI) might have taken the lead in making 
this data available easily. But the RBI seems to have abdicated its 
responsibility with regards to such data and has asked the credit bureaus 
to disseminate it. However, the bureaus again, while making it available 
for general use, have conditions forbidding the commercial use of these 
lists. Again, this is public data and yet the Govt has created a private 
monopoly over it for commercial use.


<https://the-ken.com/choking-innovation-pipe/#>

India's hidden or locked data sources


Over and again, it’s the same story. Either the government hands off public 
data to private monopolies, or it itself prevents open use of it.
Choking Innovation

It’s tempting to think of open data as a “good to have”, something India 
isn’t ready for yet and thus isn’t a pressing need. But the reality is that 
data is the fuel that powers the global economy today and most innovative 
companies. Starved of it, India entrepreneurs and businesses will never be 
able to become world class.

   1. Banks could use open data to conduct due diligence on businesses and 
   reduce their non-performing assets (NPAs). In 2016 the RBI put down 
   guidelines for the early detection of fraud or stress in a lender’s loan 
   book. But for banks to be able to do this systematically across their 
   entire breadth of borrowers, they need external service providers to be 
   able to monitor borrower’s health vitals in the form of litigations, 
   defaulter or watchlist statuses, VAT/TAN defaults, ongoing registration and 
   compliance statuses with various bodies. But this information is either 
   impossible to come by or is increasingly behind CAPTCHAs or formats that 
   cannot be processed.
   2. Conversely, open data could help well-run businesses to borrow 
   cheaply because otherwise banks are reluctant to provide loans sans credit 
   histories.
   3. According to the 2016-17 Global Fraud & Risk Report 
   <http://www.kroll.com/en-us/global-fraud-report> by Kroll, 27% of 
   surveyed Indian businesses said they have faced fraud issues from their 
   vendors or suppliers. Around 87% of the businesses say that they try (or 
   would like to try) some kind of due diligence on businesses they are 
   dealing with. But how do you do that with data hidden or locked up?
   4. Stock Brokers need open data to do due diligence on their clients 
   under the PMLA Act to ensure that their clients are not using them to 
   transact laundered money.
   5. Exporters and importers were heavy users of EXIM data to understand 
   emerging trade patterns, product lines, etc. The government’s own dashboard 
   while good looking, is no substitute for the raw data that allowed for far 
   richer analysis.
   6. Indian and international companies need open data to conduct due 
   diligence of potential customer tie ups or joint ventures. We’re seeing 
   a few large players, particularly in the automotive industry where supply 
   chains are very lean, trying to do this in a structured manner. Other big 
   corporate groups we contacted said that they do not have any such process 
   currently but would like to incorporate such process if any such service is 
   available.
   7. Fast moving consumer goods (FMCG) and market research companies use 
   Census data to understand their markets, particularly rural areas. But this 
   data is provided in a very bad shape, spread across innumerable excel 
   sheets, and therefore is very difficult to consolidate or analyze. These 
   excel sheets do not even bear proper names so it is difficult to understand 
   data contained in each of those sheets.
   8. Regulatory authorities themselves use this data to identify companies 
   involved in fraudulent or suspicious activities, or to study managements of 
   companies. Interestingly, they find it easier to get this data from the 
   private information data providers than from the Government’s own 
   ministries. Fraud investigation authorities like the Serious Frauds 
   Investigation Office use this data to understand the management history of 
   the company and to understand how companies and directors are connected to 
   each other. Often this helps them understand a group or web of companies 
   that could be acting in tandem.
   9. Lastly, apart from institutes, even individuals use data for due 
   diligence of potential employers and third party job consultants who offer 
   them jobs. They also use this when they are dealing with businesses for 
   transactions involving real estate, lending, investments.

This list too, is endless. Suffice it to say, a closed and tight-fisted 
approach to data is costing Indian businesses and citizens billions of 
dollars in lost productivity, lost profits or lost opportunities.

*Anchal Agarwal and Parijat Garg are the co-founders of Tofler which 
is solving for business transparency and visibility into Indian business 
ecosystem*

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to