[Apertium-stuff] Fw: [Mt-list] Fwd: TweetMT @ SEPLN 2015

Mikel Forcada Thu, 19 Mar 2015 04:01:55 -0700


Inici del missatge reenviat:


Data: Thu, 19 Mar 2015 08:55:09 +0100
Desde: Cristina <[email protected]>
A: mt-list <[email protected]>
Assumpte: [Mt-list] Fwd: TweetMT @ SEPLN 2015


Apologies for multiple postings
*************************************************************************TweetMT
2015--Tweet Translation Workshop at SEPLN 2015

TweetMT is a workshop and shared task on machine translation applied to
tweets. It will take place in September, 2015, in Alicante, co-located
with SEPLN 2015 (to be confirmed). The objective of the task is to bring
together interested researchers to join forces to experiment with and
compare different approaches to tweet MT. This workshop is a follow-up
to two other workshops organized previously also at SEPLN:
TweetNorm2013 and TweetLID2014.

The machine translation of tweets is a complex task that greatly
depends on the type of data we work with. The translation process of
tweets is very different from that of correct texts posted for instance
through a content manager. Tweets are often written from mobile
devices, which exacerbates the poor quality of the spelling, and
include errors, symbols and diacritics. The texts also vary in terms of
structure, where the latter include tweet-specific features such as
hashtags, user mentions, and retweets, among others. The translation of
tweets can be tackled as a direct translation (tweet-to-tweet) or as an
indirect translation (tweet normalization to standard text
(Kaufmann&Kalita, 2011), text translation and, if needed, tweet
generation). Although the first approach looks attractive, the lack of
parallel or comparable tweets for the working languages (Petrovic et
al., 2010) tends to lead us towards an indirect approach. Some authors
also try to gather similar tweets in other languages (CLIR).

Work in this area is scarce in the literature but a growing interest is
evident (Gotti et al., 2013). An important point of reference is the
work done to translate SMS texts during the Haiti earthquake (Munro,
2010).

The current task will focus on MT of tweets between languages of the
Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish),
as well as English. The organizing committee will release development
data including parallel tweets that will enable participants to train
their systems. For the final evaluation participants will have to
submit the automatic translation of a number of tweet corpora in a
short period of time. The evaluation will be carried out using
automatic distances to the reference corpora.

These corpora are not meant to be representative of all types of
messages that can be observed in informal communication. This is
instead an initial attempt at tackling part of the task which starts by
addressing one of its simplest parts. We are planing on using more
informal and varied corpora in future tasks as we make progress on
these initial issues.

The workshop aims to be a forum where researchers will have a chance to
compare their methods, systems and results.
Important dates

   - *March **1*: Registration opened
   - *April 17*: Release of the development-set
   - *May **12*: Registration deadline
   - *May 19*: Release of the test-set
   - *May 21*: Result submission deadline
   - *May 22-June 12*: Manual evaluation. Publication of results
   - *July 3*: Short paper submission deadline
   - *July 31*: Papers’ camera ready version
   - *September **14 *or* 15*: Workshop

Organizing CommitteeIñaki Alegria (UPV/EHU)
Nora Aranberri (UPV/EHU)
Cristina España-Bonet (UPC)
Pablo Gamallo (USC)
Eva Martínez (UPC)
Hugo Oliveira (Universidade de Coimbra)
Iñaki San Vicente (Elhuyar)
Antonio Toral (DCU, Dublin)
Arkaitz Zubiaga (University of Warwick)
Proceedings
The papers of the workshop will be published In the proceedings of “XXXI
Congreso de la Sociedad Española de Procesamiento de lenguaje natural”.
Proceedings of the workshop will be also published in the CEUR Workshop
Proceedings digital publication service. Information
http://komunitatea.elhuyar.org/tweetmt


-- 
Mikel L. Forcada <[email protected]>
http://www.dlsi.ua.es/~mlf/
+34 96 590 9776
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant (Spain)

Apologies for multiple postings

*************************************************************************

TweetMT 2015

--Tweet Translation Workshop at SEPLN 2015

TweetMT is a workshop and shared task on machine translation applied to tweets. It will take place in September, 2015, in Alicante, co-located with SEPLN 2015 (to be confirmed). The objective of the task is to bring together interested researchers to join forces to experiment with and compare different approaches to tweet MT. This workshop is a follow-up to two other workshops organized previously also at SEPLN: TweetNorm2013 and TweetLID2014.

The machine translation of tweets is a complex task that greatly depends on the type of data we work with. The translation process of tweets is very different from that of correct texts posted for instance through a content manager. Tweets are often written from mobile devices, which exacerbates the poor quality of the spelling, and include errors, symbols and diacritics. The texts also vary in terms of structure, where the latter include tweet-specific features such as hashtags, user mentions, and retweets, among others. The translation of tweets can be tackled as a direct translation (tweet-to-tweet) or as an indirect translation (tweet normalization to standard text (Kaufmann&Kalita, 2011), text translation and, if needed, tweet generation). Although the first approach looks attractive, the lack of parallel or comparable tweets for the working languages (Petrovic et al., 2010) tends to lead us towards an indirect approach. Some authors also try to gather similar tweets in other languages (CLIR).

Work in this area is scarce in the literature but a growing interest is evident (Gotti et al., 2013). An important point of reference is the work done to translate SMS texts during the Haiti earthquake (Munro, 2010).

The current task will focus on MT of tweets between languages of the Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish), as well as English. The organizing committee will release development data including parallel tweets that will enable participants to train their systems. For the final evaluation participants will have to submit the automatic translation of a number of tweet corpora in a short period of time. The evaluation will be carried out using automatic distances to the reference corpora.

These corpora are not meant to be representative of all types of messages that can be observed in informal communication. This is instead an initial attempt at tackling part of the task which starts by addressing one of its simplest parts. We are planing on using more informal and varied corpora in future tasks as we make progress on these initial issues.

The workshop aims to be a forum where researchers will have a chance to compare their methods, systems and results.

Important dates

March 1: Registration opened
April 17: Release of the development-set
May 12: Registration deadline
May 19: Release of the test-set
May 21: Result submission deadline
May 22-June 12: Manual evaluation. Publication of results
July 3: Short paper submission deadline
July 31: Papers’ camera ready version
September 14 or 15: Workshop

Organizing Committee

Iñaki Alegria (UPV/EHU)
Nora Aranberri (UPV/EHU)
Cristina España-Bonet (UPC)
Pablo Gamallo (USC)
Eva Martínez (UPC)
Hugo Oliveira (Universidade de Coimbra)
Iñaki San Vicente (Elhuyar)
Antonio Toral (DCU, Dublin)
Arkaitz Zubiaga (University of Warwick)

Proceedings

The papers of the workshop will be published In the proceedings of “XXXI Congreso de la Sociedad Española de Procesamiento de lenguaje natural”.
Proceedings of the workshop will be also published in the CEUR Workshop Proceedings digital publication service.

Information

http://komunitatea.elhuyar.org/tweetmt

_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff