Dear colleagues

Thanks a lot to Kilian Evang, Gilles Sérasset and Adriana Stan (who gave me
the link https://github.com/open-dict-data/ipa-dict) and Camila Buitrago
for your kind help.

I also found https://pypi.org/project/ipapy/

As I said, I want to find neologisms for Andean Amazonian languages, for
sure the first source of neologisms are the same languages but is
unavoidable (and maybe positive!) try neologisms coming from other
languages.
Against the "common sense" of getting neologisms only from spanish, I am
looking for neologisms from any language under the condition that the
neologisms "sound like" the words of the target languages. Since
natural languages rarely have perfectly phonemic orthographies, scripting
is not useful for my research, trying speech recognition is too much, so I
bet on checking out IPA representations of as many foreign languages' words
as possible.

Thanks to your links, now I have more than 500k words written in IPA
format. Next, I'll define and encode the "similar-sounding" rules, and then
iterate through all of those words to find the eligible ones.
Probably, you already notice my goal is not to propose a few but a massive
amount of neologisms. As far as I know, there is no background, but if you
know some, I'd appreciate your input. I expect to unleash the code and a
paper draft this month.

Best regards

Luis





El jue, 15 sept 2022 a las 9:17, Kilian Evang (<[email protected]>)
escribió:

> Hi Luis,
>
> Another resource you might want to look into is WikiPron:
>
> https://github.com/kylebgorman/wikipron
>
> Cheers,
> Kilian
>
> Am Do., 15. Sept. 2022 um 15:58 Uhr schrieb Gilles Sérasset <
> [email protected]>:
>
>> Hi Luis,
>>
>> Don’t know if this could be useful to you, but currently, the DBnary
>> dataset contains phonetic (IPA) transcription of many entries.
>>
>> DBnary is linked data and can be explored through its public endpoint
>> using SPARQL language: http://kaiko.getalp.org/sparql
>>
>> For instance the following query will tell you how many phonetic reps are
>> available in which languages.
>>
>> select ?lang count(?pr) where {
>>   [] ontolex:phoneticRep ?pr.
>>   BIND (lang(?pr) as ?lang)
>> }
>> GROUP BY ?lang ORDER BY DESC(COUNT(?pr))
>>
>> This will give you a long table (I only include the first lines (results
>> are order on the number of phoneticRep).
>>
>> langcallret-1
>>
>> fr-fonipa
>>
>> 2657875
>>
>> en-fonipa
>>
>> 663697
>>
>> ru-fonipa
>>
>> 389891
>>
>> de-fonipa
>>
>> 230875
>>
>> fi-fonipa
>>
>> 199269
>>
>> es-fonipa
>>
>> 187090
>>
>> la-fonipa
>>
>> 171134
>>
>> it-fonipa
>>
>> 154881
>>
>> pl-fonipa
>>
>> 136446
>>
>> sh-fonipa
>>
>> 116478
>>
>> pt-fonipa
>>
>> 90199
>>
>> ca-fonipa
>>
>> 86385
>>
>> eo-fonipa
>>
>> 84626
>>
>> avk-fonipa
>>
>> 73459
>>
>> es-ipa
>>
>> 72652
>>
>> vi-fonipa
>>
>> 72147
>>
>> As the data is continuously extracted from wiktionaries, the numbers will
>> evolve (and as several language extractors do not yet extract the phonetic
>> representation, feel free to file a feature request on DBnary bug tracker).
>>
>> More info at :
>>
>> http://kaiko.getalp.org/about-dbnary/
>>
>> Regards,
>>
>> Gilles,
>>
>>
>> On 7 Sep 2022, at 16:26, Luis Camacho Caballero <[email protected]>
>> wrote:
>>
>> Dear colleagues
>>
>> I'm devoted to the revitalization and massification of the Andean
>> Amazonian native language with computational processing as a key enabler.
>>
>> Among the many tasks to do, nowadays I'm dealing with the creation of
>> neologisms. That is why I'm looking for the larger multilingual dictionary
>> of phonetic spelling, even better if that database includes asian languages
>> (mandarin, japanese, korean, hindi, urdu, etc).
>>
>> If you have this kind of database, I kindly ask you for bring me access,
>> if you don't, I'd appreciate any clue about where and/or how access to it
>>
>> Kind regards
>>
>> Luis Camacho <https://orcid.org/0000-0001-6569-550X>
>>
>>
>> ------------------------------
>>
>>
>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>>
>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to