Thanks a lot Santosh and Yalu. Great work, i have been busy lately, i will
pick up from here now, i will also try to update the code to include the
work you have shared and perhaps do the re-run of the conversion. I just
ran the summary script and 6611 out of 9692 have been corrected. I will go
through the districts files now.

Best regards
Anjesh


On 16 November 2013 03:23, sapradhan <[email protected]> wrote:

> hi
> since its almost election day, i pushed a bit harder to get most of the
> corrections done.
> for districts that had only a few incorrect entries, only ones requiring
> corrections have value in corrected column
> for districts that had several fixes i used attached sed script to fix
> things, all rows have entry in corrected column (same as polling center for
> already correct ones)
>
> I have seen duplicate polling centers(removed some) and some erroneous
> centers. There were also cases where some part of the name of polling
> center was missing. we still not completely free of those but we ve covered
> a lot of ground.
>
> i can sleep now :D
>
> thanks
>
>
> On Friday, November 15, 2013 10:57:58 PM UTC+5:45, sapradhan wrote:
>>
>> great find Yalamber. i looked at src its in js and i think it should be
>> easy to modify it eliminate need for much of manual correction. I can do व
>> - wa , ी दीर्घ इकार - i , remove . for ं अनुस्वार and lower case
>> everything. the modified version is attached.
>>
>> please suggest what to put for
>> ङ  (currently ~N, would 'ng' do?) and ञ (~n, 'yn' is near enough)
>>
>> with case lowered and . removed from anuswar, न ँ ं ण all map to  'n',
>> which is okay i guess.
>>
>> and something out of this topic(may be it would be better to start off a
>> different thread for this), I was working on transliterating input for
>> Nepali (basically reverse of this). Some usage examples 
>> here<https://github.com/sapradhan/ne-rom-translit/wiki/Usage-examples>. I
>> only know how to implement this for Linux and early implementation is
>> here <http://nepalitankan.blogspot.com/>. Can you provide some feedback
>> and suggestions on the usage patterns ?
>>
>> thanks
>> santosh
>>
>> On Friday, November 15, 2013 9:24:41 PM UTC+5:45, ytamot wrote:
>>>
>>> And... there you have it for Dhanusa... Polling Center Eng is in
>>> Roman(only ASCII chars) - using the conversion-to-ITRANS tool I mentioned
>>> previously, plus manual editing a bit. I did not bother changing double
>>> a(s) "aa" to single, as I find distinction between अ(a) and आ(aa) necessary
>>> to make names unambiguous in many cases.
>>>
>>> Basically, manual-editing part consisted of replacing "ee" with "i",
>>> "vaa" with "wa", sometimes end of the word consonants had अ(a) suppressed
>>> but needed it, the tool added a period in words with anuswara - don't need
>>> it, ITRANS uses capital letters for some consonants - lower cased it, and
>>> small things like that. Finally, title cased the entire names. RegExp
>>> replace could perhaps be scripted for many of these rules - but human eye
>>> still may be needed.
>>>
>>> All in all, not too bad for a combination of programmatic conversion and
>>> human editing which otherwise would have taken quite a lot of time doing it
>>> manually.
>>>
>>> yālu
>>>
>>>
>>> On Fri, Nov 15, 2013 at 7:00 PM, Yalamber Tamot <[email protected]>wrote:
>>>
>>>> Hi Santosh,
>>>>
>>>> I agree IAST takes getting used to - it is too academic for the masses.
>>>> While not entirely ideal, ITRANS scheme may be better. It turns out there
>>>> is a utility to convert Devanagari into ITRANS - it runs on the browser and
>>>> can be downloaded 
>>>> here<https://docs.google.com/uc?id=0B3QLKzA0EHYWYTg4MTExYWItM2JhZC00YzQyLTkyOTEtNjhkMWE3MjFiODYz&export=download&hl=en>.
>>>> With a little modification with rules specific to Nepali language, it could
>>>> work wonders transliterating Devanagari back to Roman.
>>>>
>>>> yālu
>>>>
>>>>
>>>> On Thu, Nov 14, 2013 at 11:25 PM, sapradhan <[email protected]> wrote:
>>>>
>>>>> yalu,
>>>>> I found IAST a bit difficult to read, may be it takes some time to get
>>>>> used to. Besides it uses characters not present in normal keyboards so
>>>>> would not vouch for it. ITRANS should be easier for most of us to
>>>>> understand and adopt, perhaps there is something that translates 
>>>>> devanagari
>>>>> to ITRANS too ?
>>>>>
>>>>> anjesh,
>>>>> i am waiting on your call on whether we are repeating the
>>>>> conversion/scrubbing process again with PCS mapping
>>>>> OR resume manually correcting the entries.
>>>>>
>>>>> I have compiled required mappings PCS to unicode, somebody should be
>>>>> able to plug this into 2utf8 so that conversion is done correctly.
>>>>>
>>>>>
>>>>> On Thursday, November 14, 2013 10:34:33 PM UTC+5:45, ytamot wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Attempted Bhojpur and Dhanusa.
>>>>>>
>>>>>> Duplicate rows are identifiable as well.
>>>>>>
>>>>>> Polling Center Eng is straight forward Devanagari to 
>>>>>> IAST<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>conversion
>>>>>>  - not sure if that's going to work out for this. If you are
>>>>>> wondering about the Devanagari to IAST conversion tool 
>>>>>> this<http://devtransliteration.appspot.com/translit>one is pretty 
>>>>>> accurate. IAST is a scheme to romanize devanagari, popular
>>>>>> among Sanskrit academics worldwide.
>>>>>>
>>>>>> Please ignore the ward no.s in Bhojpur - I did my own scraping off
>>>>>> pdf and, ward no.s seemed important.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> yālu
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 14, 2013 at 9:26 AM, sapradhan <[email protected]>wrote:
>>>>>>
>>>>>>>  anjesh
>>>>>>> it is turning out to be lot more work than originally thought.
>>>>>>> most of the issues is due to PCS Nepali being used, i have created a
>>>>>>> sed script to automate PCS 2 Preeti which is attached. This script is 
>>>>>>> NOT
>>>>>>> foolproof and messes up any numerals if present(eg in Kathmandu there 
>>>>>>> are
>>>>>>> ward no.s in poll center names), there is also conflicts with ञ and ङ, 
>>>>>>> (PCS
>>>>>>> has ङ at ~).
>>>>>>> I did Baglung-Morang with this script results are better but still
>>>>>>> need manual verification, please have a look
>>>>>>>
>>>>>>> Few characters that need to be corrected manually are
>>>>>>> ज्ञ missing
>>>>>>> Bara/Salyan appearing in roman
>>>>>>> ह्ये as in गुह्येश्वरी
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, November 13, 2013 11:16:42 PM UTC+5:45, anjesh wrote:
>>>>>>>
>>>>>>>> Santosh,
>>>>>>>> That's correct. Those names must have been missed during the
>>>>>>>> scraping process and our eyes have also missed those. The number and 
>>>>>>>> name
>>>>>>>> of booths are also maintained in a database. Currently we are more
>>>>>>>> concerned with the incorrect polling-center names only, we will fix 
>>>>>>>> those
>>>>>>>> in the database (that's why the id is there in the first column) and 
>>>>>>>> share
>>>>>>>> the corrected ones, along with the number of booths in
>>>>>>>> opendatanepal.org.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anjesh.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 November 2013 23:05, sapradhan <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> anjan,
>>>>>>>>> i am assuming that you have maintained no of booths in a center
>>>>>>>>> somewhere.
>>>>>>>>>
>>>>>>>>> there are some issues like
>>>>>>>>> इलाममा
>>>>>>>>> सडक कार्यालय फिक्कल को ठाउँमा फिक्कल मात्रै
>>>>>>>>> भवनी प्रा वि पञ्चकन्या को ठाउँमा पञ्चकन्या मात्रै
>>>>>>>>> पाँचथरमा पनि कतै कता यस्तै
>>>>>>>>> i have added the fullnames there, can you verify that I am doing
>>>>>>>>> thing correctly ?
>>>>>>>>>
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>> On Wednesday, November 13, 2013 10:35:28 PM UTC+5:45, anjesh wrote:
>>>>>>>>>
>>>>>>>>>> No we are not merging them. नेसुम क, ख, ग, घ are polling booths
>>>>>>>>>> and नेसुम is polling center. Once we correct the polling center,
>>>>>>>>>> booth names could be corrected easily. Booths are listed under 
>>>>>>>>>> center name
>>>>>>>>>> e.g. http://election.opennepal.net:8000/#/constituency/11 like
>>>>>>>>>>
>>>>>>>>>> धूर्काे६ गा.वि.स. भवन, देउराली
>>>>>>>>>>
>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(क)"
>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(ख)"
>>>>>>>>>>    - "धूर्काे६ गा.वि.स. भवन, देउराली(ग)"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 13 November 2013 22:30, sapradhan <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> one confusion are we merging क ख ग into one.
>>>>>>>>>>> eg in taplejung 2 there are 4 नेसुम क, ख, ग, घ, are we merging
>>>>>>>>>>> them ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, November 13, 2013 10:17:09 PM UTC+5:45, anjesh
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Santosh. Write access is enabled now - missed that :)
>>>>>>>>>>>>
>>>>>>>>>>>> ttf-2-unicode looks useful. Perhaps someone from the community
>>>>>>>>>>>> would like to peek into it.
>>>>>>>>>>>>
>>>>>>>>>>>> Anjesh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 13 November 2013 21:32, Santa Basnet <[email protected]>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This link could be useful for your data conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://nepalinlp.blogspot.com/2010/09/few-years-back-ttf-to-
>>>>>>>>>>>>> unicode.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> In,
>>>>>>>>>>>>> Santa Basnet
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Nov 13, 2013 at 9:05 PM, sapradhan <[email protected]
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> nice initiative,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I looked into the pdf and it seems that there are two fonts
>>>>>>>>>>>>>> in use Preeti and PCS Nepali. It turns out PCS Nepali has 
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> keymapping than Preeti specifically the numeric layer ie ^‌=ट ,  
>>>>>>>>>>>>>> &=ठ  ,
>>>>>>>>>>>>>>  *=ड and so forth which is causing quite a few errors. If it is 
>>>>>>>>>>>>>> feasible
>>>>>>>>>>>>>> change the mapping based on the font being used, the conversion 
>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If it would be quicker to do this manually I can help. I dont
>>>>>>>>>>>>>> have write access to google docs, please do the needful
>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>> santosh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wednesday, November 13, 2013 6:19:09 PM UTC+5:45, prawesh
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you look at this constituency http://election.o
>>>>>>>>>>>>>>> pennepal.net:8000/#/constituency/39, there are lots of
>>>>>>>>>>>>>>> issues with unicode, which should not have been if the data 
>>>>>>>>>>>>>>> were available
>>>>>>>>>>>>>>> in proper format. We scraped the data (in Nepali Preeti font) 
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> http://www.election.gov.np/oldecn/NP/pollinglist/dist_c
>>>>>>>>>>>>>>> onst_list.html. The task was not easy,
>>>>>>>>>>>>>>> https://github.com/foss-np/2utf8 was used to convert Preeti
>>>>>>>>>>>>>>> to Unicode. There could be problems in either conversion as 
>>>>>>>>>>>>>>> well as
>>>>>>>>>>>>>>> scraping. We think that it might be quick to get help from the 
>>>>>>>>>>>>>>> community to
>>>>>>>>>>>>>>> resolve these issues. For ease, we have maintained all the 
>>>>>>>>>>>>>>> scraped polling
>>>>>>>>>>>>>>> centers district-wise in the google docs 
>>>>>>>>>>>>>>> here<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>.
>>>>>>>>>>>>>>> We plan to release these data in opendatanepal.org as well
>>>>>>>>>>>>>>> but after resolving those issues, for which we seek your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> District-file 
>>>>>>>>>>>>>>> sheet<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=78>
>>>>>>>>>>>>>>>  lists
>>>>>>>>>>>>>>> the districts and appropriate polling-list pdf file (from which 
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> scraped). And the corresponding district page has all the 
>>>>>>>>>>>>>>> polling centers
>>>>>>>>>>>>>>> (with issues, there might be repetitions as well). We have 
>>>>>>>>>>>>>>> created columns
>>>>>>>>>>>>>>> for corrected center name and romanized center name. So you 
>>>>>>>>>>>>>>> could correct
>>>>>>>>>>>>>>> the center names and add in case of missing centers. 
>>>>>>>>>>>>>>> Summary<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=77>
>>>>>>>>>>>>>>>  shows
>>>>>>>>>>>>>>> the list of districts with issues and corrected name - the 
>>>>>>>>>>>>>>> google script
>>>>>>>>>>>>>>> will run and update the numbers there. For e.g, Taplejung
>>>>>>>>>>>>>>> district 
>>>>>>>>>>>>>>> page<https://docs.google.com/a/yipl.com.np/spreadsheet/ccc?key=0AhWLpToTogBwdDR4WWNuQkZMa0I0S0dXQjlCT3pISlE&usp=drive_web#gid=2>seems
>>>>>>>>>>>>>>>  to have 3-4 issues in names. Thank you so much for your help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> With Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Prawesh Shrestha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected]
>>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal
>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde
>>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines
>>>>>>>>>>>>>> Community website: http://www.fossnepal.org/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>>> Google Groups "FOSS Nepal" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou
>>>>>>>>>>>>>> ps/opt_out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Santa B. Basnet
>>>>>>>>>>>>> Department of Computer Science & Engineering
>>>>>>>>>>>>> Nepal Engineering College
>>>>>>>>>>>>> Changunarayan, Bhaktapur, Nepal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> FOSS Nepal mailing list: [email protected]
>>>>>>>>>>>>> http://groups.google.com/group/foss-nepal
>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mailing List Guidelines: http://wiki.fossnepal.org/inde
>>>>>>>>>>>>> x.php?title=Mailing_List_Guidelines
>>>>>>>>>>>>> Community website: http://www.fossnepal.org/
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>> Google Groups "FOSS Nepal" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more options, visit https://groups.google.com/grou
>>>>>>>>>>>>> ps/opt_out.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "opendatanepal" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "opendatanepal" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>
>>>

-- 
-- 
FOSS Nepal mailing list: [email protected]
http://groups.google.com/group/foss-nepal
To unsubscribe, e-mail: [email protected]

Mailing List Guidelines: 
http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
Community website: http://www.fossnepal.org/

--- 
You received this message because you are subscribed to the Google Groups "FOSS 
Nepal" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to