Thanks a lot Santosh and Yalu. Great work, i have been busy lately, i will pick up from here now, i will also try to update the code to include the work you have shared and perhaps do the re-run of the conversion. I just ran the summary script and 6611 out of 9692 have been corrected. I will go through the districts files now.
Best regards Anjesh On 16 November 2013 03:23, sapradhan <[email protected]> wrote: > hi > since its almost election day, i pushed a bit harder to get most of the > corrections done. > for districts that had only a few incorrect entries, only ones requiring > corrections have value in corrected column > for districts that had several fixes i used attached sed script to fix > things, all rows have entry in corrected column (same as polling center for > already correct ones) > > I have seen duplicate polling centers(removed some) and some erroneous > centers. There were also cases where some part of the name of polling > center was missing. we still not completely free of those but we ve covered > a lot of ground. > > i can sleep now :D > > thanks > > > On Friday, November 15, 2013 10:57:58 PM UTC+5:45, sapradhan wrote: >> >> great find Yalamber. i looked at src its in js and i think it should be >> easy to modify it eliminate need for much of manual correction. I can do व >> - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case >> everything. the modified version is attached. >> >> please suggest what to put for >> ङ (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough) >> >> with case lowered and . removed from anuswar, न ँ ं ण all map to 'n', >> which is okay i guess. >> >> and something out of this topic(may be it would be better to start off a >> different thread for this), I was working on transliterating input for >> Nepali (basically reverse of this). Some usage examples >> here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I >> only know how to implement this for Linux and early implementation is >> here <http://nepalitankan.blogspot.com/>. Can you provide some feedback >> and suggestions on the usage patterns ? >> >> thanks >> santosh >> >> On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote: >>> >>> And... there you have it for Dhanusa... Polling Center Eng is in >>> Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned >>> previously, plus manual editing a bit. I did not bother changing double >>> a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) necessary >>> to make names unambiguous in many cases. >>> >>> Basically, manual-editing part consisted of replacing "ee" with "i", >>> "vaa" with "wa", sometimes end of the word consonants had अ(a) suppressed >>> but needed it, the tool added a period in words with anuswara - don't need >>> it, ITRANS uses capital letters for some consonants - lower cased it, and >>> small things like that. Finally, title cased the entire names. RegExp >>> replace could perhaps be scripted for many of these rules - but human eye >>> still may be needed. >>> >>> All in all, not too bad for a combination of programmatic conversion and >>> human editing which otherwise would have taken quite a lot of time doing it >>> manually. >>> >>> yālu >>> >>> >>> On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]>wrote: >>> >>>> Hi Santosh, >>>> >>>> I agree IAST takes getting used to - it is too academic for the masses. >>>> While not entirely ideal, ITRANS scheme may be better. It turns out there >>>> is a utility to convert Devanagari into ITRANS - it runs on the browser and >>>> can be downloaded >>>> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>. >>>> With a little modification with rules specific to Nepali language, it could >>>> work wonders transliterating Devanagari back to Roman. >>>> >>>> yālu >>>> >>>> >>>> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]> wrote: >>>> >>>>> yalu, >>>>> I found IAST a bit difficult to read, may be it takes some time to get >>>>> used to. Besides it uses characters not present in normal keyboards so >>>>> would not vouch for it. ITRANS should be easier for most of us to >>>>> understand and adopt, perhaps there is something that translates >>>>> devanagari >>>>> to ITRANS too ? >>>>> >>>>> anjesh, >>>>> i am waiting on your call on whether we are repeating the >>>>> conversion/scrubbing process again with PCS mapping >>>>> OR resume manually correcting the entries. >>>>> >>>>> I have compiled required mappings PCS to unicode, somebody should be >>>>> able to plug this into 2utf8 so that conversion is done correctly. >>>>> >>>>> >>>>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Attempted Bhojpur and Dhanusa. >>>>>> >>>>>> Duplicate rows are identifiable as well. >>>>>> >>>>>> Polling Center Eng is straight forward Devanagari to >>>>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion >>>>>> - not sure if that's going to work out for this. If you are >>>>>> wondering about the Devanagari to IAST conversion tool >>>>>> this<http://devtransliteration.appspot.com/translit>one is pretty >>>>>> accurate. IAST is a scheme to romanize devanagari, popular >>>>>> among Sanskrit academics worldwide. >>>>>> >>>>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off >>>>>> pdf and, ward no.s seemed important. >>>>>> >>>>>> thanks, >>>>>> >>>>>> yālu >>>>>> >>>>>> >>>>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]>wrote: >>>>>> >>>>>>> anjesh >>>>>>> it is turning out to be lot more work than originally thought. >>>>>>> most of the issues is due to PCS Nepali being used, i have created a >>>>>>> sed script to automate PCS 2 Preeti which is attached. This script is >>>>>>> NOT >>>>>>> foolproof and messes up any numerals if present(eg in Kathmandu there >>>>>>> are >>>>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, >>>>>>> (PCS >>>>>>> has ङ at ~). >>>>>>> I did Baglung-Morang with this script results are better but still >>>>>>> need manual verification, please have a look >>>>>>> >>>>>>> Few characters that need to be corrected manually are >>>>>>> ज्ञ missing >>>>>>> Bara/Salyan appearing in roman >>>>>>> ह्ये as in गुह्येश्वरी >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> >>>>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote: >>>>>>> >>>>>>>> Santosh, >>>>>>>> That's correct. Those names must have been missed during the >>>>>>>> scraping process and our eyes have also missed those. The number and >>>>>>>> name >>>>>>>> of booths are also maintained in a database. Currently we are more >>>>>>>> concerned with the incorrect polling-center names only, we will fix >>>>>>>> those >>>>>>>> in the database (that's why the id is there in the first column) and >>>>>>>> share >>>>>>>> the corrected ones, along with the number of booths in >>>>>>>> opendatanepal.org. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Anjesh. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote: >>>>>>>> >>>>>>>>> anjan, >>>>>>>>> i am assuming that you have maintained no of booths in a center >>>>>>>>> somewhere. >>>>>>>>> >>>>>>>>> there are some issues like >>>>>>>>> इलाममा >>>>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै >>>>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै >>>>>>>>> पाँचथरमा पनि कतै कता यस्तै >>>>>>>>> i have added the fullnames there, can you verify that I am doing >>>>>>>>> thing correctly ? >>>>>>>>> >>>>>>>>> thanks >>>>>>>>> >>>>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh wrote: >>>>>>>>> >>>>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths >>>>>>>>>> and नेसुम is polling center. Once we correct the polling center, >>>>>>>>>> booth names could be corrected easily. Booths are listed under >>>>>>>>>> center name >>>>>>>>>> e.g. http://election.opennepal.net:8000/#/constituency/11 like >>>>>>>>>> >>>>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली >>>>>>>>>> >>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(क)" >>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)" >>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> one confusion are we merging क ख ग into one. >>>>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging >>>>>>>>>>> them ? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :) >>>>>>>>>>>> >>>>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community >>>>>>>>>>>> would like to peek into it. >>>>>>>>>>>> >>>>>>>>>>>> Anjesh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This link could be useful for your data conversion. >>>>>>>>>>>>> >>>>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to- >>>>>>>>>>>>> unicode.html >>>>>>>>>>>>> >>>>>>>>>>>>> In, >>>>>>>>>>>>> Santa Basnet >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan <[email protected] >>>>>>>>>>>>> > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> nice initiative, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts >>>>>>>>>>>>>> in use Preeti and PCS Nepali. It turns out PCS Nepali has >>>>>>>>>>>>>> different >>>>>>>>>>>>>> keymapping than Preeti specifically the numeric layer ie ^=ट , >>>>>>>>>>>>>> &=ठ , >>>>>>>>>>>>>> *=ड and so forth which is causing quite a few errors. If it is >>>>>>>>>>>>>> feasible >>>>>>>>>>>>>> change the mapping based on the font being used, the conversion >>>>>>>>>>>>>> should be >>>>>>>>>>>>>> better. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If it would be quicker to do this manually I can help. I dont >>>>>>>>>>>>>> have write access to google docs, please do the needful >>>>>>>>>>>>>> thanks >>>>>>>>>>>>>> santosh >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you look at this constituency http://election.o >>>>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of >>>>>>>>>>>>>>> issues with unicode, which should not have been if the data >>>>>>>>>>>>>>> were available >>>>>>>>>>>>>>> in proper format. We scraped the data (in Nepali Preeti font) >>>>>>>>>>>>>>> from >>>>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c >>>>>>>>>>>>>>> onst_list.html. The task was not easy, >>>>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert Preeti >>>>>>>>>>>>>>> to Unicode. There could be problems in either conversion as >>>>>>>>>>>>>>> well as >>>>>>>>>>>>>>> scraping. We think that it might be quick to get help from the >>>>>>>>>>>>>>> community to >>>>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the >>>>>>>>>>>>>>> scraped polling >>>>>>>>>>>>>>> centers district-wise in the google docs >>>>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>. >>>>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well >>>>>>>>>>>>>>> but after resolving those issues, for which we seek your help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> District-file >>>>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78> >>>>>>>>>>>>>>> lists >>>>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from which >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>> scraped). And the corresponding district page has all the >>>>>>>>>>>>>>> polling centers >>>>>>>>>>>>>>> (with issues, there might be repetitions as well). We have >>>>>>>>>>>>>>> created columns >>>>>>>>>>>>>>> for corrected center name and romanized center name. So you >>>>>>>>>>>>>>> could correct >>>>>>>>>>>>>>> the center names and add in case of missing centers. >>>>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77> >>>>>>>>>>>>>>> shows >>>>>>>>>>>>>>> the list of districts with issues and corrected name - the >>>>>>>>>>>>>>> google script >>>>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung >>>>>>>>>>>>>>> district >>>>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems >>>>>>>>>>>>>>> to have 3-4 issues in names. Thank you so much for your help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> With Regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Prawesh Shrestha >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Santa B. Basnet >>>>>>>>>>>>> Department of Computer Science & Engineering >>>>>>>>>>>>> Nepal Engineering College >>>>>>>>>>>>> Changunarayan, Bhaktapur, Nepal >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>>> >>>>>>>>>>>>> --- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>>> >>>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "opendatanepal" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "opendatanepal" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> >>> -- -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/ --- You received this message because you are subscribed to the Google Groups "FOSS Nepal" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
