Re: Fwd: LREC 2012 Workshop on Language Resource Merging - Extended Deadline to Feb. 22, 2012

Katrin Tomanek Mon, 27 Feb 2012 23:39:57 -0800

Hi all,

I can definitely do this. However, I would wait, until I have thevalidated OANC data (MASK) from Nancy Ide; some code changes might benecessary on this.


Best,
Katrin



On 02/24/2012 04:01 AM, Jason Baldridge wrote:

+1

On Wed, Feb 22, 2012 at 4:21 PM, Joern Kottmann<[email protected]>  wrote:

The code we could add to the formats package and ship
with OpenNLP directly.

+1 to do that

Jörn

On Wed, Feb 22, 2012 at 10:53 PM, Jason Baldridge<
[email protected]>  wrote:

The OANC is great, and I'm glad to hear that training those models on it
work well. It would certainly be a good idea to look at the MASK subset
and
see how things work out with that. (Also, just to get a sense of the size
of it.)

If you are up for it, would you be interested in adding code and data to
train on OANC for the OpenNLP-Models git repo?

https://github.com/utcompling/OpenNLP-Models

-Jason

On Wed, Feb 22, 2012 at 3:22 AM, Katrin Tomanek
<[email protected]>wrote:




Hi,

as for corpora we could be using to train freely available models for
opennlp on: I have tested OANC (the "open" section of the american
national corpus, http://www.**americannationalcorpus.org/**OANC<

http://www.americannationalcorpus.org/OANC>

).

Although the OANC is automatically tagged, I obtain quite OK results
(sentence splitting, tokenization, POS Tagging, NP Chunking).

So, maybe we can provide models trained on OANC for download?

Moreover: Nancy Ide just told me, there was a subset of the OANC (called
MASK) which was manually validated and is also freely available. This
might be even better.

What do you think?

Best,
Katrin



On 02/20/2012 11:01 PM, Jason Baldridge wrote:

Might be some things we should look at wrt to our goals of creating
annotated resources. -j

---------- Forwarded message ----------
From: ELRA ELDA Information<[email protected]>
Date: Wed, Feb 15, 2012 at 7:36 AM
Subject: LREC 2012 Workshop on Language Resource Merging - Extended
Deadline to Feb. 22, 2012

To:



Call for Papers
LREC 2012 Workshop on: Language Resource Merging
22 May 2012 – Afternoon Session

EXTENDED Submission deadline: 22 FEBRUARY

CONTEXT
The availability of adequate language resources has been a well-known
bottleneck for most high-level language technology applications, e.g.
Machine Translation, parsing, and Information Extraction, for at least

years, and the impact of the bottleneck is becoming all the more

apparent

with the availability of higher computational power and massive

storage,

since modern language technologies are capable of using far more

resources

than the community produces. The present landscape is characterized by

the

existence of numerous scattered resources, many of which have differing
levels of coverage, types of information and granularity. Taken
singularly,
existing resources do not have sufficient coverage, quality or richness
for
robust large-scale applications, and yet they contain valuable

information

(Monachini et al. 2004 and 2006; Soria et al. 2006; Molinero, Sagot and
Nicolas 2009; Necsulescu et al. 2011). Differing technology or

application

requirements, ignorance of the existence of certain resources, and
difficulties in accessing and using them, has led to the proliferation

of

multiple, unconnected resources that, if merged, could constitute a

much

richer repository of information augmenting either coverage or
granularity,
or both, and consequently multiplying the number of potential language
technology applications. Merging, combining and/or compiling larger
resources from existing ones thus appears to be a promising direction

to

take.
The re-use and merging of existing resources is not altogether unknown.
For
example, WordNet (Fellbaum, 1998) has been successfully reused in a
variety
of applications. But this is the exception rather than the rule; in

fact,

merging, and enhancing existing resources is uncommon, probably

because it

is by no means a trivial task given the profound differences in

formats,

formalisms, metadata, and linguistic assumptions.
The language resource landscape is on the brink of a large change,
however.
With the proliferation of accessible metadata catalogues, and resource
repositories (such as the new META-SHARE (

http://www.meta-net.eu/meta-***

*<http://www.meta-net.eu/meta-**>
share<http://www.meta-net.eu/**meta-share<

http://www.meta-net.eu/meta-share>>)

infrastructure), a potentially
large number of existing resources will be more easily located,

accessed

and downloaded. Also, with the advent of distributed platforms for the
automatic production of language resources, such as PANACEA (
http://www.panacea-lr.eu/), new language resources and linguistic
information capable of being integrated into those resources will be
produced more easily and at a lower cost. Thus, it is likely that
researchers and application developers will seek out resources already
available before developing new, costly ones, and will require methods

for

merging/combining various resources and adapting them to their specific
needs.
Up to the present day, most resource merging has been done manually,

with

only a small number of attempts reported in the literature towards
(semi-)automatic merging of resources (Crouch&   King 2005; Pustejovsky

et

al. 2005; Molinero, Sagot and Nicolas 2009; Necsulescu et al. 2011). In
order to take a further step towards the scenario depicted above, in

which

resource merging and enhancing is a reliable and accessible first step

for

researchers and application developers, experience and best practices

must

be shared and discussed, as this will help the whole community avoid

any

waste of time and resources.

AIMS OF THE WORKSHOP
This half-day workshop is meant to be part of a series of meetings
constituting an ongoing forum for sharing and evaluating the results of
different methods and systems for the automatic production of language
resources (the first one was the LREC 2010 Workshop on Methods for the
Automatic Production of Language Resources and their Evaluation

Methods).

The main focus of this workshop is on (semi-)automatic means of merging
language resources, such as lexicons, corpora and grammars. Merging

makes

it possible to re-use, adapt, and enhance existing resources, alongside
new, automatically created ones, with the goal of reducing the manual
intervention required in language resource production, and thus

ultimately

production costs.

WORKSHOP TOPICS
The topics of the workshop are related to best practices, methods,
techniques and experimental results regarding the merging of various

types

of language resources, such as lexicons and corpora, especially in

support

of language technology applications. In particular, new methods for
automatic merging with a view towards reducing human intervention will

be

most welcome.
Topics for submission include, but are not limited to:
Experiments on (semi-)automatic merging of automatically produced
resources
Experiments on the merging of two or more existing resources containing
the
same or different levels of linguistic information
Studies or experiments on merging resources at different levels of
granularity (corpora, lexicons, grammars)
Studies or experiments on unifying, mapping or converting encoding

formats

Comparison between different resources and mapping algorithms to

provide

desired merging
Use of linguistic information from different sources in high-level
language
applications
Use of new, merged language resources in language technology

applications


WORKSHOP WEBSITE:
http://panacea-lr.eu/en/news/****project/2011/12/19/lrec-2012-****<

http://panacea-lr.eu/en/news/**project/2011/12/19/lrec-2012-**>

merging-lr-workshop/<http://**panacea-lr.eu/en/news/project/**
2011/12/19/lrec-2012-merging-**lr-workshop/<

http://panacea-lr.eu/en/news/project/2011/12/19/lrec-2012-merging-lr-workshop/


SUBMISSIONS
Interested participants must submit a preliminary paper of about 4-6

pages

including references (between 2000-2500 words). For the submission

please

use the online form on START LREC Conference Manager at:
https://www.softconf.com/****lrec2012/MergingLR2012/<

https://www.softconf.com/**lrec2012/MergingLR2012/>

<https:**//www.softconf.com/lrec2012/**MergingLR2012/<

https://www.softconf.com/lrec2012/MergingLR2012/>

When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e.

also

technologies, standards, evaluation kits, etc.) that have been used for
the
work described in the paper or are a new result of your research.
For further information on this new initiative, please refer to
http://www.lrec-conf.org/****lrec2012/?LRE-Map-2012<

http://www.lrec-conf.org/**lrec2012/?LRE-Map-2012>

<http://**www.lrec-conf.org/lrec2012/?**LRE-Map-2012<

http://www.lrec-conf.org/lrec2012/?LRE-Map-2012>

Papers will be peer-reviewed by the workshop Program Committee.

IMPORTANT DATES
Deadline for paper submission: 22 February 2012 (23:59 CET +1)
**EXTENDED**
Notification of acceptance: 15 March 2012
Submission of camera-ready version of papers: 31 March 2012
Workshop date: 22 May 2012 – Afternoon Session

ORGANIZING COMMITTEE
Núria Bel, UPF, Barcelona, Spain
Maria Gavrilidou, ILSP-“Athena”, Athens, Greece,
Monica Monachini, CNR-ILC, Pisa, Italy
Valeria Quochi, CNR-ILC, Pisa, Italy
Laura Rimell, University of Cambridge, UK

Contacts
lrec12_workshop_merging@ilc.****cnr.it<http://cnr.it

<lrec12_workshop_**

[email protected]<[email protected]>>


PROGRAMME COMMITTEE:
Victoria Arranz, ELDA, Paris, France
Paul Buitelaaar, National University of Ireland, Galway, Ireland
Nicoletta Calzolari, CNR-ILC, Pisa, Italy
Olivier Hamon, ELDA, Paris, France
Aleš Horák, Masaryk University, Brno, Czech Republic
Nancy Ide, Vassar College, Mass. USA
Bernardo Magnini, FBK, Trento, Italy
Paola Monachesi, Utrecht University, Utrecht, The Netherlands
Jan Odijk, , Utrecht University, Utrecht, The Netherlands
Muntsa Padró, IULA, Barcellona, Spain
Karel Pala, Masaryk University, Brno, Czech Republic
Thierry Poibeau University of Cambridge, UK and CNRS, Paris, France
Benoît Sagot, INRIA, Paris, France
Kiril Simov, Bulgarian Academy of Sciences, Sofia, Bulgaria
Claudia Soria, CNR-ILC, Pisa, Italy
Maurizio Tesconi, CNR-IIT, Pisa


--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg

Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: [email protected]

Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080




--
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge



--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg

Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: [email protected]

Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080

Re: Fwd: LREC 2012 Workshop on Language Resource Merging - Extended Deadline to Feb. 22, 2012

Reply via email to