glad to be of any help. about that 3k difference between corrected and not 
corrected, large portion of that may be due to the fact that in many cases 
i did not bother to add entry to corrected column if the original text was 
correct. 

if you need to rerun the conversion, here are a few leads to possible bugs 
in the conversion process
ज्ञ is often found missing (i suspect this happens if its the first letter 
as in ज्ञानज्योति -> ानज्योति)
if polling center name contains a district name as substring it was being 
replaced in romanized form as in बज्रबाराही -> बज्रBaraही, सल्यानी -> 
Salyanी, दाङवाङ -> Dangवाङ
 a side effect of some other search replace routine ? 
गणेश -> गण्ोश appears to be a bug in 2utf8

let us know if we can do anything more 

thanks


On Saturday, November 16, 2013 10:02:02 AM UTC+5:45, anjesh tuladhar wrote:
>
> Thanks a lot Santosh and Yalu. Great work, i have been busy lately, i will 
> pick up from here now, i will also try to update the code to include the 
> work you have shared and perhaps do the re-run of the conversion. I just 
> ran the summary script and 6611 out of 9692 have been corrected. I will 
> go through the districts files now. 
>
> Best regards
> Anjesh
>
>
> On 16 November 2013 03:23, sapradhan <[email protected] <javascript:>>wrote:
>
>> hi 
>> since its almost election day, i pushed a bit harder to get most of the 
>> corrections done.
>>  for districts that had only a few incorrect entries, only ones requiring 
>> corrections have value in corrected column
>> for districts that had several fixes i used attached sed script to fix 
>> things, all rows have entry in corrected column (same as polling center for 
>> already correct ones)
>>
>> I have seen duplicate polling centers(removed some) and some erroneous 
>> centers. There were also cases where some part of the name of polling 
>> center was missing. we still not completely free of those but we ve covered 
>> a lot of ground.
>>
>> i can sleep now :D
>>
>> thanks
>>
>>
>> On Friday, November 15, 2013 10:57:58 PM UTC+5:45, sapradhan wrote:
>>>
>>> great find Yalamber. i looked at src its in js and i think it should be 
>>> easy to modify it eliminate need for much of manual correction. I can do व 
>>> - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case 
>>> everything. the modified version is attached.
>>>  
>>> please suggest what to put for 
>>> ङ  (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough) 
>>>
>>> with case lowered and . removed from anuswar, न ँ ं ण all map to  'n', 
>>> which is okay i guess. 
>>>
>>> and something out of this topic(may be it would be better to start off a 
>>> different thread for this), I was working on transliterating input for 
>>> Nepali (basically reverse of this). Some usage examples 
>>> here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I 
>>> only know how to implement this for Linux and early implementation is 
>>> here <http://nepalitankan.blogspot.com/>. Can you provide some feedback 
>>> and suggestions on the usage patterns ?
>>>
>>> thanks 
>>> santosh 
>>>
>>> On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote:
>>>>
>>>> And... there you have it for Dhanusa... Polling Center Eng is in 
>>>> Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned 
>>>> previously, plus manual editing a bit. I did not bother changing double 
>>>> a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) 
>>>> necessary 
>>>> to make names unambiguous in many cases.
>>>>
>>>> Basically, manual-editing part consisted of replacing "ee" with "i", 
>>>> "vaa" with "wa", sometimes end of the word consonants had अ(a) suppressed 
>>>> but needed it, the tool added a period in words with anuswara - don't need 
>>>> it, ITRANS uses capital letters for some consonants - lower cased it, and 
>>>> small things like that. Finally, title cased the entire names. RegExp 
>>>> replace could perhaps be scripted for many of these rules - but human eye 
>>>> still may be needed.
>>>>
>>>> All in all, not too bad for a combination of programmatic conversion 
>>>> and human editing which otherwise would have taken quite a lot of time 
>>>> doing it manually.
>>>>
>>>> yālu
>>>>
>>>>
>>>> On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]>wrote:
>>>>
>>>>> Hi Santosh,
>>>>>
>>>>> I agree IAST takes getting used to - it is too academic for the 
>>>>> masses. While not entirely ideal, ITRANS scheme may be better. It turns 
>>>>> out 
>>>>> there is a utility to convert Devanagari into ITRANS - it runs on the 
>>>>> browser and can be downloaded 
>>>>> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>.
>>>>>  
>>>>> With a little modification with rules specific to Nepali language, it 
>>>>> could 
>>>>> work wonders transliterating Devanagari back to Roman.
>>>>>
>>>>> yālu
>>>>>
>>>>>
>>>>> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]>wrote:
>>>>>
>>>>>> yalu,
>>>>>> I found IAST a bit difficult to read, may be it takes some time to 
>>>>>> get used to. Besides it uses characters not present in normal keyboards 
>>>>>> so 
>>>>>> would not vouch for it. ITRANS should be easier for most of us to 
>>>>>> understand and adopt, perhaps there is something that translates 
>>>>>> devanagari 
>>>>>> to ITRANS too ? 
>>>>>>
>>>>>> anjesh,
>>>>>> i am waiting on your call on whether we are repeating the 
>>>>>> conversion/scrubbing process again with PCS mapping 
>>>>>> OR resume manually correcting the entries.
>>>>>>
>>>>>> I have compiled required mappings PCS to unicode, somebody should be 
>>>>>> able to plug this into 2utf8 so that conversion is done correctly.
>>>>>>
>>>>>>
>>>>>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Attempted Bhojpur and Dhanusa.
>>>>>>>
>>>>>>> Duplicate rows are identifiable as well.
>>>>>>>
>>>>>>> Polling Center Eng is straight forward Devanagari to 
>>>>>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion
>>>>>>>  - not sure if that's going to work out for this. If you are 
>>>>>>> wondering about the Devanagari to IAST conversion tool 
>>>>>>> this<http://devtransliteration.appspot.com/translit>one is pretty 
>>>>>>> accurate. IAST is a scheme to romanize devanagari, popular 
>>>>>>> among Sanskrit academics worldwide.
>>>>>>>
>>>>>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off 
>>>>>>> pdf and, ward no.s seemed important.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> yālu
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]>wrote:
>>>>>>>
>>>>>>>>  anjesh 
>>>>>>>> it is turning out to be lot more work than originally thought. 
>>>>>>>> most of the issues is due to PCS Nepali being used, i have created 
>>>>>>>> a sed script to automate PCS 2 Preeti which is attached. This script 
>>>>>>>> is NOT 
>>>>>>>> foolproof and messes up any numerals if present(eg in Kathmandu there 
>>>>>>>> are 
>>>>>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, 
>>>>>>>> (PCS 
>>>>>>>> has ङ at ~). 
>>>>>>>> I did Baglung-Morang with this script results are better but still 
>>>>>>>> need manual verification, please have a look  
>>>>>>>>
>>>>>>>> Few characters that need to be corrected manually are
>>>>>>>> ज्ञ missing
>>>>>>>> Bara/Salyan appearing in roman
>>>>>>>> ह्ये as in गुह्येश्वरी
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote:
>>>>>>>>
>>>>>>>>> Santosh, 
>>>>>>>>> That's correct. Those names must have been missed during the 
>>>>>>>>> scraping process and our eyes have also missed those. The number and 
>>>>>>>>> name 
>>>>>>>>> of booths are also maintained in a database. Currently we are more 
>>>>>>>>> concerned with the incorrect polling-center names only, we will fix 
>>>>>>>>> those 
>>>>>>>>> in the database (that's why the id is there in the first column) and 
>>>>>>>>> share 
>>>>>>>>> the corrected ones, along with the number of booths in 
>>>>>>>>> opendatanepal.org. 
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Anjesh.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> anjan, 
>>>>>>>>>> i am assuming that you have maintained no of booths in a center 
>>>>>>>>>> somewhere.
>>>>>>>>>>  
>>>>>>>>>> there are some issues like 
>>>>>>>>>> इलाममा 
>>>>>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै 
>>>>>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै
>>>>>>>>>> पाँचथरमा पनि कतै कता यस्तै 
>>>>>>>>>> i have added the fullnames there, can you verify that I am doing 
>>>>>>>>>> thing correctly ? 
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths 
>>>>>>>>>>> and नेसुम is polling center. Once we correct the polling 
>>>>>>>>>>> center, booth names could be corrected easily. Booths are listed 
>>>>>>>>>>> under 
>>>>>>>>>>> center name e.g. http://election.opennepal.net:8000/#/
>>>>>>>>>>> constituency/11 like
>>>>>>>>>>>
>>>>>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली
>>>>>>>>>>>
>>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(क)" 
>>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)"
>>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> one confusion are we merging क ख ग into one.
>>>>>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging 
>>>>>>>>>>>> them ? 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community 
>>>>>>>>>>>>> would like to peek into it. 
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anjesh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This link could be useful for your data conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to-
>>>>>>>>>>>>>> unicode.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In,
>>>>>>>>>>>>>> Santa Basnet
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> nice initiative, 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts 
>>>>>>>>>>>>>>> in use Preeti and PCS Nepali. It turns out PCS Nepali has 
>>>>>>>>>>>>>>> different 
>>>>>>>>>>>>>>> keymapping than Preeti specifically the numeric layer ie ^‌=ट , 
>>>>>>>>>>>>>>>  &=ठ  , 
>>>>>>>>>>>>>>>  *=ड and so forth which is causing quite a few errors. If it is 
>>>>>>>>>>>>>>> feasible 
>>>>>>>>>>>>>>> change the mapping based on the font being used, the conversion 
>>>>>>>>>>>>>>> should be 
>>>>>>>>>>>>>>> better. 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If it would be quicker to do this manually I can help. I 
>>>>>>>>>>>>>>> dont have write access to google docs, please do the needful 
>>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>>> santosh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you look at this constituency http://election.o
>>>>>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of 
>>>>>>>>>>>>>>>> issues with unicode, which should not have been if the data 
>>>>>>>>>>>>>>>> were available 
>>>>>>>>>>>>>>>> in proper format. We scraped the data (in Nepali Preeti font) 
>>>>>>>>>>>>>>>> from 
>>>>>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c
>>>>>>>>>>>>>>>> onst_list.html. The task was not easy, 
>>>>>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert 
>>>>>>>>>>>>>>>> Preeti to Unicode. There could be problems in either 
>>>>>>>>>>>>>>>> conversion as well as 
>>>>>>>>>>>>>>>> scraping. We think that it might be quick to get help from the 
>>>>>>>>>>>>>>>> community to 
>>>>>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the 
>>>>>>>>>>>>>>>> scraped polling 
>>>>>>>>>>>>>>>> centers district-wise in the google docs 
>>>>>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well 
>>>>>>>>>>>>>>>> but after resolving those issues, for which we seek your help. 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> District-file 
>>>>>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78>
>>>>>>>>>>>>>>>>  lists 
>>>>>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from 
>>>>>>>>>>>>>>>> which we 
>>>>>>>>>>>>>>>> scraped). And the corresponding district page has all the 
>>>>>>>>>>>>>>>> polling centers 
>>>>>>>>>>>>>>>> (with issues, there might be repetitions as well). We have 
>>>>>>>>>>>>>>>> created columns 
>>>>>>>>>>>>>>>> for corrected center name and romanized center name. So you 
>>>>>>>>>>>>>>>> could correct 
>>>>>>>>>>>>>>>> the center names and add in case of missing centers. 
>>>>>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>
>>>>>>>>>>>>>>>>  shows 
>>>>>>>>>>>>>>>> the list of districts with issues and corrected name - the 
>>>>>>>>>>>>>>>> google script 
>>>>>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung 
>>>>>>>>>>>>>>>> district 
>>>>>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems
>>>>>>>>>>>>>>>>  to have 3-4 issues in names. Thank you so much for your help.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> With Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Prawesh Shrestha 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected]
>>>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal
>>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde
>>>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines
>>>>>>>>>>>>>>> Community website: http://www.fossnepal.org/
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> --- 
>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>> Google Groups "FOSS Nepal" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>> from it, send an email to [email protected].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou
>>>>>>>>>>>>>>> ps/opt_out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> Santa B. Basnet
>>>>>>>>>>>>>> Department of Computer Science & Engineering
>>>>>>>>>>>>>> Nepal Engineering College
>>>>>>>>>>>>>> Changunarayan, Bhaktapur, Nepal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected]
>>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal
>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde
>>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines
>>>>>>>>>>>>>> Community website: http://www.fossnepal.org/
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> --- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "FOSS Nepal" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou
>>>>>>>>>>>>>> ps/opt_out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "opendatanepal" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "opendatanepal" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>
>>>>
>

-- 
-- 
FOSS Nepal mailing list: [email protected]
http://groups.google.com/group/foss-nepal
To unsubscribe, e-mail: [email protected]

Mailing List Guidelines: 
http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
Community website: http://www.fossnepal.org/

--- 
You received this message because you are subscribed to the Google Groups "FOSS 
Nepal" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to