hi since its almost election day, i pushed a bit harder to get most of the corrections done. for districts that had only a few incorrect entries, only ones requiring corrections have value in corrected column for districts that had several fixes i used attached sed script to fix things, all rows have entry in corrected column (same as polling center for already correct ones)
I have seen duplicate polling centers(removed some) and some erroneous centers. There were also cases where some part of the name of polling center was missing. we still not completely free of those but we ve covered a lot of ground. i can sleep now :D thanks On Friday, November 15, 2013 10:57:58 PM UTC+5:45, sapradhan wrote: > > great find Yalamber. i looked at src its in js and i think it should be > easy to modify it eliminate need for much of manual correction. I can do व > - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case > everything. the modified version is attached. > > please suggest what to put for > ङ (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough) > > with case lowered and . removed from anuswar, न ँ ं ण all map to 'n', > which is okay i guess. > > and something out of this topic(may be it would be better to start off a > different thread for this), I was working on transliterating input for > Nepali (basically reverse of this). Some usage examples > here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I > only know how to implement this for Linux and early implementation is > here<http://nepalitankan.blogspot.com/>. > Can you provide some feedback and suggestions on the usage patterns ? > > thanks > santosh > > On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote: >> >> And... there you have it for Dhanusa... Polling Center Eng is in >> Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned >> previously, plus manual editing a bit. I did not bother changing double >> a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) necessary >> to make names unambiguous in many cases. >> >> Basically, manual-editing part consisted of replacing "ee" with "i", >> "vaa" with "wa", sometimes end of the word consonants had अ(a) suppressed >> but needed it, the tool added a period in words with anuswara - don't need >> it, ITRANS uses capital letters for some consonants - lower cased it, and >> small things like that. Finally, title cased the entire names. RegExp >> replace could perhaps be scripted for many of these rules - but human eye >> still may be needed. >> >> All in all, not too bad for a combination of programmatic conversion and >> human editing which otherwise would have taken quite a lot of time doing it >> manually. >> >> yālu >> >> >> On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]> wrote: >> >>> Hi Santosh, >>> >>> I agree IAST takes getting used to - it is too academic for the masses. >>> While not entirely ideal, ITRANS scheme may be better. It turns out there >>> is a utility to convert Devanagari into ITRANS - it runs on the browser and >>> can be downloaded >>> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>. >>> >>> With a little modification with rules specific to Nepali language, it could >>> work wonders transliterating Devanagari back to Roman. >>> >>> yālu >>> >>> >>> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]> wrote: >>> >>>> yalu, >>>> I found IAST a bit difficult to read, may be it takes some time to get >>>> used to. Besides it uses characters not present in normal keyboards so >>>> would not vouch for it. ITRANS should be easier for most of us to >>>> understand and adopt, perhaps there is something that translates >>>> devanagari >>>> to ITRANS too ? >>>> >>>> anjesh, >>>> i am waiting on your call on whether we are repeating the >>>> conversion/scrubbing process again with PCS mapping >>>> OR resume manually correcting the entries. >>>> >>>> I have compiled required mappings PCS to unicode, somebody should be >>>> able to plug this into 2utf8 so that conversion is done correctly. >>>> >>>> >>>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote: >>>> >>>>> Hi all, >>>>> >>>>> Attempted Bhojpur and Dhanusa. >>>>> >>>>> Duplicate rows are identifiable as well. >>>>> >>>>> Polling Center Eng is straight forward Devanagari to >>>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion >>>>> - not sure if that's going to work out for this. If you are >>>>> wondering about the Devanagari to IAST conversion tool >>>>> this<http://devtransliteration.appspot.com/translit>one is pretty >>>>> accurate. IAST is a scheme to romanize devanagari, popular >>>>> among Sanskrit academics worldwide. >>>>> >>>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off pdf >>>>> and, ward no.s seemed important. >>>>> >>>>> thanks, >>>>> >>>>> yālu >>>>> >>>>> >>>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]> wrote: >>>>> >>>>>> anjesh >>>>>> it is turning out to be lot more work than originally thought. >>>>>> most of the issues is due to PCS Nepali being used, i have created a >>>>>> sed script to automate PCS 2 Preeti which is attached. This script is >>>>>> NOT >>>>>> foolproof and messes up any numerals if present(eg in Kathmandu there >>>>>> are >>>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, >>>>>> (PCS >>>>>> has ङ at ~). >>>>>> I did Baglung-Morang with this script results are better but still >>>>>> need manual verification, please have a look >>>>>> >>>>>> Few characters that need to be corrected manually are >>>>>> ज्ञ missing >>>>>> Bara/Salyan appearing in roman >>>>>> ह्ये as in गुह्येश्वरी >>>>>> >>>>>> thanks >>>>>> >>>>>> >>>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote: >>>>>> >>>>>>> Santosh, >>>>>>> That's correct. Those names must have been missed during the >>>>>>> scraping process and our eyes have also missed those. The number and >>>>>>> name >>>>>>> of booths are also maintained in a database. Currently we are more >>>>>>> concerned with the incorrect polling-center names only, we will fix >>>>>>> those >>>>>>> in the database (that's why the id is there in the first column) and >>>>>>> share >>>>>>> the corrected ones, along with the number of booths in >>>>>>> opendatanepal.org. >>>>>>> >>>>>>> Thanks >>>>>>> Anjesh. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote: >>>>>>> >>>>>>>> anjan, >>>>>>>> i am assuming that you have maintained no of booths in a center >>>>>>>> somewhere. >>>>>>>> >>>>>>>> there are some issues like >>>>>>>> इलाममा >>>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै >>>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै >>>>>>>> पाँचथरमा पनि कतै कता यस्तै >>>>>>>> i have added the fullnames there, can you verify that I am doing >>>>>>>> thing correctly ? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh wrote: >>>>>>>> >>>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths >>>>>>>>> and नेसुम is polling center. Once we correct the polling center, >>>>>>>>> booth names could be corrected easily. Booths are listed under center >>>>>>>>> name >>>>>>>>> e.g. http://election.opennepal.net:8000/#/constituency/11 like >>>>>>>>> >>>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली >>>>>>>>> >>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(क)" >>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)" >>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)" >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> one confusion are we merging क ख ग into one. >>>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging >>>>>>>>>> them ? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :) >>>>>>>>>>> >>>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community >>>>>>>>>>> would like to peek into it. >>>>>>>>>>> >>>>>>>>>>> Anjesh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote: >>>>>>>>>>> >>>>>>>>>>>> This link could be useful for your data conversion. >>>>>>>>>>>> >>>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to- >>>>>>>>>>>> unicode.html >>>>>>>>>>>> >>>>>>>>>>>> In, >>>>>>>>>>>> Santa Basnet >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan >>>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>>> >>>>>>>>>>>>> nice initiative, >>>>>>>>>>>>> >>>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts in >>>>>>>>>>>>> use Preeti and PCS Nepali. It turns out PCS Nepali has different >>>>>>>>>>>>> keymapping >>>>>>>>>>>>> than Preeti specifically the numeric layer ie ^=ट , &=ठ , *=ड >>>>>>>>>>>>> and so >>>>>>>>>>>>> forth which is causing quite a few errors. If it is feasible >>>>>>>>>>>>> change the >>>>>>>>>>>>> mapping based on the font being used, the conversion should be >>>>>>>>>>>>> better. >>>>>>>>>>>>> >>>>>>>>>>>>> If it would be quicker to do this manually I can help. I dont >>>>>>>>>>>>> have write access to google docs, please do the needful >>>>>>>>>>>>> thanks >>>>>>>>>>>>> santosh >>>>>>>>>>>>> >>>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you look at this constituency http://election.o >>>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of >>>>>>>>>>>>>> issues with unicode, which should not have been if the data were >>>>>>>>>>>>>> available >>>>>>>>>>>>>> in proper format. We scraped the data (in Nepali Preeti font) >>>>>>>>>>>>>> from >>>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c >>>>>>>>>>>>>> onst_list.html. The task was not easy, >>>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert Preeti >>>>>>>>>>>>>> to Unicode. There could be problems in either conversion as well >>>>>>>>>>>>>> as >>>>>>>>>>>>>> scraping. We think that it might be quick to get help from the >>>>>>>>>>>>>> community to >>>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the >>>>>>>>>>>>>> scraped polling >>>>>>>>>>>>>> centers district-wise in the google docs >>>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well >>>>>>>>>>>>>> but after resolving those issues, for which we seek your help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> District-file >>>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78> >>>>>>>>>>>>>> lists >>>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from which >>>>>>>>>>>>>> we >>>>>>>>>>>>>> scraped). And the corresponding district page has all the >>>>>>>>>>>>>> polling centers >>>>>>>>>>>>>> (with issues, there might be repetitions as well). We have >>>>>>>>>>>>>> created columns >>>>>>>>>>>>>> for corrected center name and romanized center name. So you >>>>>>>>>>>>>> could correct >>>>>>>>>>>>>> the center names and add in case of missing centers. >>>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77> >>>>>>>>>>>>>> shows >>>>>>>>>>>>>> the list of districts with issues and corrected name - the >>>>>>>>>>>>>> google script >>>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung >>>>>>>>>>>>>> district >>>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems >>>>>>>>>>>>>> to have 3-4 issues in names. Thank you so much for your help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> With Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Prawesh Shrestha >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>>> >>>>>>>>>>>>> --- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>>> >>>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Santa B. Basnet >>>>>>>>>>>> Department of Computer Science & Engineering >>>>>>>>>>>> Nepal Engineering College >>>>>>>>>>>> Changunarayan, Bhaktapur, Nepal >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>> >>>>>>>>>>>> --- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>> >>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "opendatanepal" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "opendatanepal" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> >> -- -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/ --- You received this message because you are subscribed to the Google Groups "FOSS Nepal" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
pcs2preeti.sed
Description: Binary data
