Hi, And as Peter said, it looks difficult to scrape. There’s a recaptcha. Regards, Sanjay
On Tue, 10 Nov 2020 at 07:43 [email protected] <[email protected]> wrote: > Hi Nikhil, > Here's a pdf I put together a while ago which gives some indication of the > deeply nested nature of the e-courts website (it's very similar to what > Sanjay has already posted, but dives a bit deeper). If one opens a case in > a new window, the original search is still available to return to, but > otherwise, each case entails a fresh search. > And yes, I did mean KrutiDev. I found a very useful site which coverts to > Unicode: https://www.fontconverter.in/hindi.php?q=Krutidev-to-Unicode > best wishes, > Peter > > On Monday, November 9, 2020 at 6:50:14 PM UTC+10:30 [email protected] > wrote: > >> Hi Peter, >> >> Can you share a sample instruction (click this -> click that) or link on >> how to reach a place on the website where we can see a listing under the >> IPC code? >> >> <digressing> >> About KritiDev - do you mean KrutiDev? >> >> There's converters available now to convert from legacy ascii fonts >> (where we would use a custom font to make A's glyph look like one akshar >> and B look like another akshar and so on) to unicode (where different >> languages have their own char code and co-exist). >> >> I found various websites on searching online for "hindi to unicode >> converter", but also there's this open source collection of htmls contain >> javascripts that I have used to work with earlier: >> https://sites.google.com/site/hindifontconverters/files. Has simple web >> page files with javascripts to do the conversions. >> >> A budget document I was working with 5 yrs back had its own version of >> legacy font - I hacked into one javascript here, added in new mappings and >> customised my own converter. >> >> Sorry to digress but just sharing in case the legacy font thing was being >> a blocker to anyone. Also if someone wants to build a full solution out of >> this that takes say word docs and converts to unicode without losing >> formatting and can bring in some resources - let me know. I didn't have the >> skills to programmatically work with office docs 5 yrs ago; I do now. >> >> And there was one surprise finding related to this: I've found that >> legacy fonts survive the journey through pdfs better than unicode. So if an >> institution insists on sharing documents as pdf, I'd rather have them stick >> to their old legacy fonts and use one of these converter tools at my end to >> get the text out into unicode. >> >> -- >> Cheers, >> Nikhil VJ >> https://nikhilvj.co.in >> >> On Mon, Nov 9, 2020 at 12:41 PM [email protected] <[email protected]> >> wrote: >> >>> Hi Lovish, >>> My experience (only for district courts in MP) is that scraping is *not* >>> possible. It *is* possible to look for all cases in which a specific >>> IPC offence is involved (e.g. 376(D) Gang rape). But to find out what >>> happened in each case, you must go to each *seriatim*, check what >>> decision was made by the court and--if you're lucky--access the judgement >>> made in the case. In MP, those judgements are in Hindi, rendered in >>> KritiDev. >>> I've written a paper looking at some rape cases. Feel free to contact me >>> directly. >>> best wishes, >>> Peter Mayer >>> On Monday, November 9, 2020 at 1:35:09 PM UTC+10:30 Lovish Sharma wrote: >>> >>>> Hi, >>>> >>>> I am working as an associate for an NGO working in the field of crimes >>>> against women. Currently I am doing research on crimes against women in >>>> prominent cities. For that, I need to scrap the data from e-courts >>>> website, *https://districts.ecourts.gov.in/ >>>> <https://districts.ecourts.gov.in/>* . >>>> >>>> Kindly help me with that. >>>> >>> -- >>> >> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/datameet/c9029026-4bf0-464b-ada8-6c4964911afen%40googlegroups.com >>> <https://groups.google.com/d/msgid/datameet/c9029026-4bf0-464b-ada8-6c4964911afen%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/datameet/80ae69d8-525c-4300-af06-7eb305f6bd73n%40googlegroups.com > <https://groups.google.com/d/msgid/datameet/80ae69d8-525c-4300-af06-7eb305f6bd73n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAOWzc8AtQEX6PoDGcHZbdC5fEA1N%2Bx%3D6uoSXZopKxZc6-LO%3DfA%40mail.gmail.com.
