Dear colleagues,
We are pleased to announce the release of the PxCorpus, a 4 hours of
transcribed and annotated dialogues of drug prescriptions in French
acquired through an experiment with 55 participants experts and
non-experts in drug prescriptions. This corpus was built in
collaboration between the Laboratoire d'Informatique de Grenoble (LIG)
the University Hospital of Grenoble (CHU Grenoble) and the Calystene
society through a CIFRE project financed by the ANRT (Association
Nationale de la Recherche et de la Technologie).
PxCorpus is to the best of our knowledge, the first spoken medical
drug prescriptions corpus to be distributed. The automatic
transcriptions were verified by human effort and aligned with semantic
labels to allow training of NLP models. The data acquisition protocol
was reviewed by medical experts and permit free distribution without
breach of privacy and regulation.
## Overview of the Corpus
The experiment has been performed in wild conditions with naive
participants and medical experts.
In total, the dataset includes 2067 recordings of 55 participants (38%
non-experts, 25% doctors, 36% medical practitioners), manually
transcribed and semantically annotated.
| Category | Sessions | Recordings | Time(m)|
|------------------| -------- | ---------- | ------ |
| Medical experts | 258 | 434 | 94.83 |
| Doctors | 230 | 570 | 105.21 |
| Non experts | 415 | 977 | 62.13 |
| Total | 903 | 1981 | 262.27 |
## License
We hope that that the community will be able to benefit from the dataset
which is distributed with an attribution 4.0 International (CC BY 4.0)
Creative Commons licence.
## How to cite this corpus
If you use the corpus or need more details please refer to the following
paper: A spoken drug prescription datset in French for spoken Language
Understanding
@InProceedings{Kocabiyikoglu2022,
author = "Alican Kocabiyikoglu and Fran{\c c}ois Portet and
Prudence Gibert and Hervé Blanchon and Jean-Marc Babouchkine and Gaëtan
Gavazzi",
title = "A spoken drug prescription datset in French for spoken
Language Understanding",
booktitle = "13th Language Ressources and Evaluation Conference
(LREC 2022)",
year = "2022",
location = "Marseille, France"
}
a more complete description of the corpus acquisition is available on arxiv
@misc{kocabiyikoglu2023spoken,
title={Spoken Dialogue System for Medical Prescription Acquisition
on Smartphone: Development, Corpus and Evaluation},
author={Ali Can Kocabiyikoglu and François Portet and Jean-Marc
Babouchkine and Prudence Gibert and Hervé Blanchon and Gaëtan Gavazzi},
year={2023},
eprint={2311.03510},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
## Download
The corpus can be found in the Zenodoo Catalogue under the following
links and references:
*PxCorpus : A Spoken Drug Prescription Dataset in French for Spoken
Language Understanding and Dialogue*
https://zenodo.org/doi/10.5281/zenodo.6482586
--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE
Phone: +33 (0)4 57 42 15 44
Email:[email protected]
www:http://membres-liglab.imag.fr/portet/
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]