great find Yalamber. i looked at src its in js and i think it should be easy to modify it eliminate need for much of manual correction. I can do व - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case everything. the modified version is attached. please suggest what to put for ङ (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough)
with case lowered and . removed from anuswar, न ँ ं ण all map to 'n', which is okay i guess. and something out of this topic(may be it would be better to start off a different thread for this), I was working on transliterating input for Nepali (basically reverse of this). Some usage examples here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I only know how to implement this for Linux and early implementation is here<http://nepalitankan.blogspot.com/>. Can you provide some feedback and suggestions on the usage patterns ? thanks santosh On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote: > > And... there you have it for Dhanusa... Polling Center Eng is in > Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned > previously, plus manual editing a bit. I did not bother changing double > a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) necessary > to make names unambiguous in many cases. > > Basically, manual-editing part consisted of replacing "ee" with "i", "vaa" > with "wa", sometimes end of the word consonants had अ(a) suppressed but > needed it, the tool added a period in words with anuswara - don't need it, > ITRANS uses capital letters for some consonants - lower cased it, and small > things like that. Finally, title cased the entire names. RegExp replace > could perhaps be scripted for many of these rules - but human eye still may > be needed. > > All in all, not too bad for a combination of programmatic conversion and > human editing which otherwise would have taken quite a lot of time doing it > manually. > > yālu > > > On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]<javascript:> > > wrote: > >> Hi Santosh, >> >> I agree IAST takes getting used to - it is too academic for the masses. >> While not entirely ideal, ITRANS scheme may be better. It turns out there >> is a utility to convert Devanagari into ITRANS - it runs on the browser and >> can be downloaded >> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>. >> >> With a little modification with rules specific to Nepali language, it could >> work wonders transliterating Devanagari back to Roman. >> >> yālu >> >> >> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]<javascript:> >> > wrote: >> >>> yalu, >>> I found IAST a bit difficult to read, may be it takes some time to get >>> used to. Besides it uses characters not present in normal keyboards so >>> would not vouch for it. ITRANS should be easier for most of us to >>> understand and adopt, perhaps there is something that translates devanagari >>> to ITRANS too ? >>> >>> anjesh, >>> i am waiting on your call on whether we are repeating the >>> conversion/scrubbing process again with PCS mapping >>> OR resume manually correcting the entries. >>> >>> I have compiled required mappings PCS to unicode, somebody should be >>> able to plug this into 2utf8 so that conversion is done correctly. >>> >>> >>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote: >>> >>>> Hi all, >>>> >>>> Attempted Bhojpur and Dhanusa. >>>> >>>> Duplicate rows are identifiable as well. >>>> >>>> Polling Center Eng is straight forward Devanagari to >>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion >>>> - not sure if that's going to work out for this. If you are >>>> wondering about the Devanagari to IAST conversion tool >>>> this<http://devtransliteration.appspot.com/translit>one is pretty >>>> accurate. IAST is a scheme to romanize devanagari, popular >>>> among Sanskrit academics worldwide. >>>> >>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off pdf >>>> and, ward no.s seemed important. >>>> >>>> thanks, >>>> >>>> yālu >>>> >>>> >>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]> wrote: >>>> >>>>> anjesh >>>>> it is turning out to be lot more work than originally thought. >>>>> most of the issues is due to PCS Nepali being used, i have created a >>>>> sed script to automate PCS 2 Preeti which is attached. This script is NOT >>>>> foolproof and messes up any numerals if present(eg in Kathmandu there are >>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, >>>>> (PCS >>>>> has ङ at ~). >>>>> I did Baglung-Morang with this script results are better but still >>>>> need manual verification, please have a look >>>>> >>>>> Few characters that need to be corrected manually are >>>>> ज्ञ missing >>>>> Bara/Salyan appearing in roman >>>>> ह्ये as in गुह्येश्वरी >>>>> >>>>> thanks >>>>> >>>>> >>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote: >>>>> >>>>>> Santosh, >>>>>> That's correct. Those names must have been missed during the scraping >>>>>> process and our eyes have also missed those. The number and name of >>>>>> booths >>>>>> are also maintained in a database. Currently we are more concerned with >>>>>> the >>>>>> incorrect polling-center names only, we will fix those in the database >>>>>> (that's why the id is there in the first column) and share the corrected >>>>>> ones, along with the number of booths in opendatanepal.org. >>>>>> >>>>>> Thanks >>>>>> Anjesh. >>>>>> >>>>>> >>>>>> >>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote: >>>>>> >>>>>>> anjan, >>>>>>> i am assuming that you have maintained no of booths in a center >>>>>>> somewhere. >>>>>>> >>>>>>> there are some issues like >>>>>>> इलाममा >>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै >>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै >>>>>>> पाँचथरमा पनि कतै कता यस्तै >>>>>>> i have added the fullnames there, can you verify that I am doing >>>>>>> thing correctly ? >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh wrote: >>>>>>> >>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths >>>>>>>> and नेसुम is polling center. Once we correct the polling center, >>>>>>>> booth names could be corrected easily. Booths are listed under center >>>>>>>> name >>>>>>>> e.g. http://election.opennepal.net:8000/#/constituency/11 like >>>>>>>> >>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली >>>>>>>> >>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(क)" >>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)" >>>>>>>> - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)" >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote: >>>>>>>> >>>>>>>>> one confusion are we merging क ख ग into one. >>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging >>>>>>>>> them ? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh wrote: >>>>>>>>> >>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :) >>>>>>>>>> >>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community >>>>>>>>>> would like to peek into it. >>>>>>>>>> >>>>>>>>>> Anjesh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> This link could be useful for your data conversion. >>>>>>>>>>> >>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to- >>>>>>>>>>> unicode.html >>>>>>>>>>> >>>>>>>>>>> In, >>>>>>>>>>> Santa Basnet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan >>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>> >>>>>>>>>>>> nice initiative, >>>>>>>>>>>> >>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts in >>>>>>>>>>>> use Preeti and PCS Nepali. It turns out PCS Nepali has different >>>>>>>>>>>> keymapping >>>>>>>>>>>> than Preeti specifically the numeric layer ie ^=ट , &=ठ , *=ड >>>>>>>>>>>> and so >>>>>>>>>>>> forth which is causing quite a few errors. If it is feasible >>>>>>>>>>>> change the >>>>>>>>>>>> mapping based on the font being used, the conversion should be >>>>>>>>>>>> better. >>>>>>>>>>>> >>>>>>>>>>>> If it would be quicker to do this manually I can help. I dont >>>>>>>>>>>> have write access to google docs, please do the needful >>>>>>>>>>>> thanks >>>>>>>>>>>> santosh >>>>>>>>>>>> >>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> If you look at this constituency http://election.o >>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of issues >>>>>>>>>>>>> with unicode, which should not have been if the data were >>>>>>>>>>>>> available in >>>>>>>>>>>>> proper format. We scraped the data (in Nepali Preeti font) from >>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c >>>>>>>>>>>>> onst_list.html. The task was not easy, >>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert Preeti >>>>>>>>>>>>> to Unicode. There could be problems in either conversion as well >>>>>>>>>>>>> as >>>>>>>>>>>>> scraping. We think that it might be quick to get help from the >>>>>>>>>>>>> community to >>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the >>>>>>>>>>>>> scraped polling >>>>>>>>>>>>> centers district-wise in the google docs >>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>. >>>>>>>>>>>>> >>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well >>>>>>>>>>>>> but after resolving those issues, for which we seek your help. >>>>>>>>>>>>> >>>>>>>>>>>>> District-file >>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78> >>>>>>>>>>>>> lists >>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from which >>>>>>>>>>>>> we >>>>>>>>>>>>> scraped). And the corresponding district page has all the polling >>>>>>>>>>>>> centers >>>>>>>>>>>>> (with issues, there might be repetitions as well). We have >>>>>>>>>>>>> created columns >>>>>>>>>>>>> for corrected center name and romanized center name. So you could >>>>>>>>>>>>> correct >>>>>>>>>>>>> the center names and add in case of missing centers. >>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77> >>>>>>>>>>>>> shows >>>>>>>>>>>>> the list of districts with issues and corrected name - the google >>>>>>>>>>>>> script >>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung >>>>>>>>>>>>> district >>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems >>>>>>>>>>>>> to have 3-4 issues in names. Thank you so much for your help. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> With Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Prawesh Shrestha >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>>> >>>>>>>>>>>> --- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>> >>>>>>>>>>>> For more options, visit https://groups.google.com/grou >>>>>>>>>>>> ps/opt_out. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Santa B. Basnet >>>>>>>>>>> Department of Computer Science & Engineering >>>>>>>>>>> Nepal Engineering College >>>>>>>>>>> Changunarayan, Bhaktapur, Nepal >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> FOSS Nepal mailing list: [email protected] >>>>>>>>>>> http://groups.google.com/group/foss-nepal >>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde >>>>>>>>>>> x.php?title=Mailing_List_Guidelines >>>>>>>>>>> Community website: http://www.fossnepal.org/ >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "FOSS Nepal" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> >>>>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "opendatanepal" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "opendatanepal" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> > -- -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/ --- You received this message because you are subscribed to the Google Groups "FOSS Nepal" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
<<< text/html; charset=UTF-16LE; name="Devanagari to iTrans Converter_02.htm": Unrecognized >>>
