glad to be of any help. about that 3k difference between corrected and not corrected, large portion of that may be due to the fact that in many cases i did not bother to add entry to corrected column if the original text was correct.
if you need to rerun the conversion, here are a few leads to possible bugs in the conversion process ज्ञ is often found missing (i suspect this happens if its the first letter as in ज्ञानज्योति -> ानज्योति) if polling center name contains a district name as substring it was being replaced in romanized form as in बज्रबाराही -> बज्रBaraही, सल्यानी -> Salyanी, दाङवाङ -> Dangवाङ a side effect of some other search replace routine ? गणेश -> गण्ोश appears to be a bug in 2utf8 let us know if we can do anything more thanks On Saturday, November 16, 2013 10:02:02 AM UTC+5:45, anjesh tuladhar wrote: > > Thanks a lot Santosh and Yalu. Great work, i have been busy lately, i will > pick up from here now, i will also try to update the code to include the > work you have shared and perhaps do the re-run of the conversion. I just > ran the summary script and 6611 out of 9692 have been corrected. I will > go through the districts files now. > > Best regards > Anjesh > > > On 16 November 2013 03:23, sapradhan <[email protected] <javascript:>>wrote: > >> hi >> since its almost election day, i pushed a bit harder to get most of the >> corrections done. >> for districts that had only a few incorrect entries, only ones requiring >> corrections have value in corrected column >> for districts that had several fixes i used attached sed script to fix >> things, all rows have entry in corrected column (same as polling center for >> already correct ones) >> >> I have seen duplicate polling centers(removed some) and some erroneous >> centers. There were also cases where some part of the name of polling >> center was missing. we still not completely free of those but we ve covered >> a lot of ground. >> >> i can sleep now :D >> >> thanks >> >> >> On Friday, November 15, 2013 10:57:58 PM UTC+5:45, sapradhan wrote: >>> >>> great find Yalamber. i looked at src its in js and i think it should be >>> easy to modify it eliminate need for much of manual correction. I can do व >>> - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case >>> everything. the modified version is attached. >>> >>> please suggest what to put for >>> ङ (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough) >>> >>> with case lowered and . removed from anuswar, न ँ ं ण all map to 'n', >>> which is okay i guess. >>> >>> and something out of this topic(may be it would be better to start off a >>> different thread for this), I was working on transliterating input for >>> Nepali (basically reverse of this). Some usage examples >>> here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I >>> only know how to implement this for Linux and early implementation is >>> here <http://nepalitankan.blogspot.com/>. Can you provide some feedback >>> and suggestions on the usage patterns ? >>> >>> thanks >>> santosh >>> >>> On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote: >>>> >>>> And... there you have it for Dhanusa... Polling Center Eng is in >>>> Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned >>>> previously, plus manual editing a bit. I did not bother changing double >>>> a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) >>>> necessary >>>> to make names unambiguous in many cases. >>>> >>>> Basically, manual-editing part consisted of replacing "ee" with "i", >>>> "vaa" with "wa", sometimes end of the word consonants had अ(a) suppressed >>>> but needed it, the tool added a period in words with anuswara - don't need >>>> it, ITRANS uses capital letters for some consonants - lower cased it, and >>>> small things like that. Finally, title cased the entire names. RegExp >>>> replace could perhaps be scripted for many of these rules - but human eye >>>> still may be needed. >>>> >>>> All in all, not too bad for a combination of programmatic conversion >>>> and human editing which otherwise would have taken quite a lot of time >>>> doing it manually. >>>> >>>> yālu >>>> >>>> >>>> On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]>wrote: >>>> >>>>> Hi Santosh, >>>>> >>>>> I agree IAST takes getting used to - it is too academic for the >>>>> masses. While not entirely ideal, ITRANS scheme may be better. It turns >>>>> out >>>>> there is a utility to convert Devanagari into ITRANS - it runs on the >>>>> browser and can be downloaded >>>>> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>. >>>>> >>>>> With a little modification with rules specific to Nepali language, it >>>>> could >>>>> work wonders transliterating Devanagari back to Roman. >>>>> >>>>> yālu >>>>> >>>>> >>>>> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]>wrote: >>>>> >>>>>> yalu, >>>>>> I found IAST a bit difficult to read, may be it takes some time to >>>>>> get used to. Besides it uses characters not present in normal keyboards >>>>>> so >>>>>> would not vouch for it. ITRANS should be easier for most of us to >>>>>> understand and adopt, perhaps there is something that translates >>>>>> devanagari >>>>>> to ITRANS too ? >>>>>> >>>>>> anjesh, >>>>>> i am waiting on your call on whether we are repeating the >>>>>> conversion/scrubbing process again with PCS mapping >>>>>> OR resume manually correcting the entries. >>>>>> >>>>>> I have compiled required mappings PCS to unicode, somebody should be >>>>>> able to plug this into 2utf8 so that conversion is done correctly. >>>>>> >>>>>> >>>>>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Attempted Bhojpur and Dhanusa. >>>>>>> >>>>>>> Duplicate rows are identifiable as well. >>>>>>> >>>>>>> Polling Center Eng is straight forward Devanagari to >>>>>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion >>>>>>> - not sure if that's going to work out for this. If you are >>>>>>> wondering about the Devanagari to IAST conversion tool >>>>>>> this<http://devtransliteration.appspot.com/translit>one is pretty >>>>>>> accurate. IAST is a scheme to romanize devanagari, popular >>>>>>> among Sanskrit academics worldwide. >>>>>>> >>>>>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off >>>>>>> pdf and, ward no.s seemed important. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> yālu >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]>wrote: >>>>>>> >>>>>>>> anjesh >>>>>>>> it is turning out to be lot more work than originally thought. >>>>>>>> most of the issues is due to PCS Nepali being used, i have created >>>>>>>> a sed script to automate PCS 2 Preeti which is attached. This script >>>>>>>> is NOT >>>>>>>> foolproof and messes up any numerals if present(eg in Kathmandu there >>>>>>>> are >>>>>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, >>>>>>>> (PCS >>>>>>>> has ङ at ~). >>>>>>>> I did Baglung-Morang with this script results are better but still >>>>>>>> need manual verification, please have a look >>>>>>>> >>>>>>>> Few characters that need to be corrected manually are >>>>>>>> ज्ञ missing >>>>>>>> Bara/Salyan appearing in roman >>>>>>>> ह्ये as in गुह्येश्वरी >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote: >>>>>>>> >>>>>>>>> Santosh, >>>>>>>>> That's correct. Those names must have been missed during the >>>>>>>>> scraping process and our eyes have also missed those. The number and >>>>>>>>> name >>>>>>>>> of booths are also maintained in a database. Currently we are more >>>>>>>>> concerned with the incorrect polling-center names only, we will fix >>>>>>>>> those >>>>>>>>> in the database (that's why the id is there in the first column) and >>>>>>>>> share >>>>>>>>> the corrected ones, along with the number of booths in >>>>>>>>> opendatanepal.org. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Anjesh. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> anjan, >>>>>>>>>> i am assuming that you have maintained no of booths in a center >>>>>>>>>> somewhere. >>>>>>>>>> >>>>>>>>>> there are some issues like >>>>>>>>>> इलाममा >>>>>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै >>>>>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै >>>>>>>>>> पाँचथरमा पनि कतै कता यस्तै >>>>>>>>>> i have added the fullnames there, can you verify that I am doing >>>>>>>>>> thing correctly ? >>>>>>>>>> >>>>>>>>>> thanks >>>>>>>>>> >>>>>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths >>>>>>>>>>> and नेसुम is polling center. Once we correct the polling >>>>>>>>>>> center, booth names could be corrected easily. Booths are listed >>>>>>>>>>> under >>>>>>>>>>> center name e.g. http://election.opennepal.net:8000/#/ >>>>>>>>>>> constituency/11 like >>>>>>>>>>> >>>>>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली >>>>>>>>>>> >>>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(क)" >>>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)" >>>>>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)" >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> one confusion are we merging क ख ग into one. >>>>>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging >>>>>>>>>>>> them ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :) >>>>>>>>>>>>> >>>>>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community >>>>>>>>>>>>> would like to peek into it. >>>>>>>>>>>>> >>>>>>>>>>>>> Anjesh >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> This link could be useful for your data conversion. >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to- >>>>>>>>>>>>>> unicode.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> In, >>>>>>>>>>>>>> Santa Basnet >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> nice initiative, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts >>>>>>>>>>>>>>> in use Preeti and PCS Nepali. It turns out PCS Nepali has >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>> keymapping than Preeti specifically the numeric layer ie ^=ट , >>>>>>>>>>>>>>> &=ठ , >>>>>>>>>>>>>>> *=ड and so forth which is causing quite a few errors. If it is >>>>>>>>>>>>>>> feasible >>>>>>>>>>>>>>> change the mapping based on the font being used, the conversion >>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>> better. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If it would be quicker to do this manually I can help. I >>>>>>>>>>>>>>> dont have write access to google docs, please do the needful >>>>>>>>>>>>>>> thanks >>>>>>>>>>>>>>> santosh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you look at this constituency http://election.o >>>>>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of >>>>>>>>>>>>>>>> issues with unicode, which should not have been if the data >>>>>>>>>>>>>>>> were available >>>>>>>>>>>>>>>> in proper format. We scraped the data (in Nepali Preeti font) >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c >>>>>>>>>>>>>>>> onst_list.html. The task was not easy, >>>>>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert >>>>>>>>>>>>>>>> Preeti to Unicode. There could be problems in either >>>>>>>>>>>>>>>> conversion as well as >>>>>>>>>>>>>>>> scraping. We think that it might be quick to get help from the >>>>>>>>>>>>>>>> community to >>>>>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the >>>>>>>>>>>>>>>> scraped polling >>>>>>>>>>>>>>>> centers district-wise in the google docs >>>>>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well >>>>>>>>>>>>>>>> but after resolving those issues, for which we seek your help. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> District-file >>>>>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78> >>>>>>>>>>>>>>>> lists >>>>>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from >>>>>>>>>>>>>>>> which we >>>>>>>>>>>>>>>> scraped). And the corresponding district page has all the >>>>>>>>>>>>>>>> polling centers >>>>>>>>>>>>>>>> (with issues, there might be repetitions as well). We have >>>>>>>>>>>>>>>> created columns >>>>>>>>>>>>>>>> for corrected center name and romanized center name. So you >>>>>>>>>>>>>>>> could correct >>>>>>>>>>>>>>>> the center names and add in case of missing centers. >>>>>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77> >>>>>>>>>>>>>>>> shows >>>>>>>>>>>>>>>> the list of districts with issues and corrected name - the >>>>>>>>>>>>>>>> google script >>>>>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung >>>>>>>>>>>>>>>> district >>>>>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems >>>>>>>>>>>>>>>> to have 3-4 issues in names. Thank you so much for your help. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> With Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Prawesh Shrestha >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>> from it, send an email to [email protected]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Santa B. Basnet >>>>>>>>>>>>>> Department of Computer Science & Engineering >>>>>>>>>>>>>> Nepal Engineering College >>>>>>>>>>>>>> Changunarayan, Bhaktapur, Nepal >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "opendatanepal" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "opendatanepal" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> >>>> > -- -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/ --- You received this message because you are subscribed to the Google Groups "FOSS Nepal" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
