[Corpora-List] release of thePxCorpus : A Spoken Drug Prescription Dataset in French for Spoken Language Understanding and Dialogue

François Portet via Corpora Wed, 08 Nov 2023 00:01:56 -0800

Dear colleagues,

We are pleased to announce the release of the PxCorpus, a 4 hours oftranscribed and annotated dialogues of drug prescriptions in Frenchacquired through an experiment with 55 participants experts andnon-experts in drug prescriptions. This corpus was built incollaboration between the Laboratoire d'Informatique de Grenoble (LIG)the University Hospital of Grenoble (CHU Grenoble) and the Calystenesociety through a CIFRE project financed by the ANRT (AssociationNationale de la Recherche et de la Technologie).

PxCorpus is to the best of our knowledge, the first spoken medicaldrug prescriptions corpus to be distributed. The automatictranscriptions were verified by human effort and aligned with semanticlabels to allow training of NLP models. The data acquisition protocolwas reviewed by medical experts and permit free distribution withoutbreach of privacy and regulation.


## Overview of the Corpus

The experiment has been performed in wild conditions with naiveparticipants and medical experts.In total, the dataset includes 2067 recordings of 55 participants (38%non-experts, 25% doctors, 36% medical practitioners), manuallytranscribed and semantically annotated.


| Category         | Sessions | Recordings | Time(m)|

|------------------| -------- | ---------- | ------ |

| Medical experts  |   258    |    434     |  94.83 |

| Doctors          |   230    |    570     | 105.21 |

| Non experts      |   415    |    977     |  62.13 |

| Total            |   903    |   1981     | 262.27 |


## License

We hope that that the community will be able to benefit from the datasetwhich is distributed with an attribution 4.0 International (CC BY 4.0)Creative Commons licence.


## How to cite this corpus

If you use the corpus or need more details please refer to the followingpaper: A spoken drug prescription datset in French for spoken LanguageUnderstanding


@InProceedings{Kocabiyikoglu2022,

author = "Alican Kocabiyikoglu and Fran{\c c}ois Portet andPrudence Gibert and Hervé Blanchon and Jean-Marc Babouchkine and GaëtanGavazzi", title = "A spoken drug prescription datset in French for spokenLanguage Understanding", booktitle = "13th Language Ressources and Evaluation Conference(LREC 2022)",

  year =     "2022",
  location =     "Marseille, France"
}

a more complete description of the corpus acquisition is available on arxiv

@misc{kocabiyikoglu2023spoken,

title={Spoken Dialogue System for Medical Prescription Acquisitionon Smartphone: Development, Corpus and Evaluation},

author={Ali Can Kocabiyikoglu and François Portet and Jean-MarcBabouchkine and Prudence Gibert and Hervé Blanchon and Gaëtan Gavazzi},


     year={2023},

     eprint={2311.03510},

     archivePrefix={arXiv},

     primaryClass={cs.CL}

}

## Download

The corpus can be found in the Zenodoo Catalogue under the followinglinks and references:

*PxCorpus : A Spoken Drug Prescription Dataset in French for SpokenLanguage Understanding and Dialogue*


https://zenodo.org/doi/10.5281/zenodo.6482586

--

François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE

Phone:  +33 (0)4 57 42 15 44
Email:[email protected]
www:http://membres-liglab.imag.fr/portet/

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] release of thePxCorpus : A Spoken Drug Prescription Dataset in French for Spoken Language Understanding and Dialogue

Reply via email to