Hi all,

Nice work Vivek!

I was scraping to catch by *timestamp* how lead margins, vote counts
change over time, from statewise results pages like this
<https://results.eci.gov.in/PcResultGenJune2024/statewiseS011.htm>  and PC-wise
results pages like this
<https://results.eci.gov.in/PcResultGenJune2024/ConstituencywiseS0124.htm> .

I've collated the data and posted it along with scraping and
collating (python) scripts on this github repo:
https://github.com/answerquest/india-elections2024-results-timewise

*Flaws in this data:*
1. Didn't catch it all from the beginning : leads-margins tally scraping
was started from around *1:50 pm,* per-candidate vote numbers scraping was
started from around *4.30pm.*
2. There would be some missed time intervals for some constituencies
sometimes some pages didn't load, script errored out due to edge cases
3. I bungled up on applying "U" prefixes for union territories so those
rows were scraped quite late.

But all in all I think it's a pretty good dataset to make time-series
viz's,
to "audit" tallies over time and detect out-of-norm additions, etc for
folks who are interested in settling some ongoing debates using data.

*Disclaimer :* I'm only sharing the data as-was at those timestamps, this
is secondary scraped data that is prone to flukes like a html tag
mis-rendering causing a bad number to come in. If you find something odd,
kindly lookup the official sources, file RTIs etc, but leave me out of it
pls.


------------------------------------

*My compliments to Election Commission of India, in case anyone from there
is reading:*
1. It was good to have whole integers of absolute vote counts given by ECI.
Hope to see this maintained. This was a lot better than the rounded-off
fractions of vote-shares we were getting during the US 2020 elections
counting which had made it impossible to calculate the actual numbers of
votes.

2. Good website work, consistent naming of each constituency / state's
pages and consistent page structures.

3. Page-not-opening cases were there but were rare, and the chinks
disappeared from around evening onwards when the declarations were
happening and I'd expect more site visitors. On my part, I ensured my
scripts were hitting 1-at-a-time only, kept adequate time intervals etc so
that I don't bombard the server (to coders : this was intentional. Don't
suggest "fixing" it by parallel threading etc, that gets you 429'd).

4. Candidates' photos were properly organized and were instantly rendered
on all the PC-wise pages I was checking out. Which means each and every
candidate was properly tracked in the DB and their files were properly
linked and small thumbnails were kept, as opposed to past elections when
there would only be scanned pages listing all the candidates's totals. One
suggestion: converting these to .webp format will shrink the sizes and your
egress loads by around 10x.

5. Even prior to election, voters lists were quite well managed, even the
voter roll pdfs were easy to download, and it was quite easy to find our
part + serial number provided we'd done our homework (which FYI was the
only info we needed in hand apart from photo id on voting day, if you just
shared these with the officer when you entered the booth, they'd locate
your entry in 5 seconds and you would be done voting in under a minute.)

6. All in all, we've come a long way in digitization and making this data
accessible to all, Thank you for all the work done.

7. It would be great if you published some inside stories of the technical
infrastructure (server specs etc) used on 4th June for serving the website.


--
Cheers,
Nikhil VJ
https://nikhilvj.co.in


On Wed, Jun 5, 2024 at 9:25 AM Vivek Matthew <[email protected]> wrote:

> Hi all,
>
> I have scraped the 2024 Lok Sabha election results from the
> results.eci.gov.in website. In case anyone is interested, you can find
> the CSV with the results attached.
>
> Once constituency-wise turnout numbers are released for phase 7, I will
> include additional columns for turnout and vote share numbers.
>
> Note that semicolon (;) is used as the column separator.
>
> Regards,
> Vivek
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/0c95b2a3-27d3-4146-8ce3-44a49ae72f6fn%40googlegroups.com
> <https://groups.google.com/d/msgid/datameet/0c95b2a3-27d3-4146-8ce3-44a49ae72f6fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAH7jeuMg6EiA0ZL1x8hjVOQi8BxB10vqme5seQbY0-R%2BSbjK7g%40mail.gmail.com.

Reply via email to