This is an automated email from the ASF dual-hosted git repository.
joern pushed a change to branch replace_SE_chars
in repository https://gitbox.apache.org/repos/asf/opennlp-sandbox.git.
at bfd38be Use STX and ETX for start end seq chars in normalizer
This branch includes the following new commits:
new 1542aa1 OPENNLP-132 Added sandbox folder
new 7449178 OPENNLP-206 Created folder for corpus server
new de7270c OPENNLP-206 Initial corpus server check in
new 0cb1372 OPENNLP-209 Created wikinews-importer folder
new 64fa6a9 OPENNLP-209 Simple tool to upload wikinews xmi files
new 335ecf7 OPENNLP-206 Added support to parse CASes and a method to
create a corpus
new 4334479 OPENNLP-209 Added one more util to create a corpus in the
corpus server
new 0935bb4 OPENNLP-206 Added folder for Corpus Server Client
new e723d04 OPENNLP-206 Created a tools project for the corpus server
new 8e12c0f OPENNLP-206 Renamed from client to tools
new 61a6cfd OPENNLP-206 Added the usual suspects to svn:ignore
new d29c6cd OPENNLP-211 First version of wikinews importer, based on code
contributed by Olivier Grisel. Thanks.
new dbf560c OPENNLP-208 Added derby storage support
new 308f24a OPENNLP-235 Created project folder for Cas Editor plugin
new 7dd2e29 OPENNLP-235 Added initial project structure
new 8ac39b3 OPENNLP-235 Added basic pom based on a pom from an uima
eclipse project
new 29b99b9 OPENNLP-235 Added OpenNLP Plugin class
new 57d1407 OPENNLP-235 Added preference initializer
new 8c1c3b2 OPENNLP-235 Added preference page
new a3e8f36 OPENNLP-235 Added name finder view classes
new f7ef4a0 OPENNLP-235 Added columns to the table viewer and it shows
now a sample entity
new 5f1fd2d OPENNLP-235 Added the name finder to the content provider.
Now it runs once and provides the table with a list of names.
new 3932004 OPENNLP-235 Added confirmed column to the table, and a
confirmed field to Entity.
new 4475d44 OPENNLP-235 Added confirm action. Right now only creates an
annotation.
new 2a6260f OPENNLP-235 Name finder is now restricted to existing
annotations.
new d7139fc OPENNLP-235 Added entity merging, and name finder is
triggered on changes.
new 7c0c01c OPENNLP-235 searchEntities now searches for intersecting
entities, instead of entities with identical bounds.
new 1918c9c OPENNLP-235 Added a sorter to the table.
new f1735c1 OPENNLP-207 Initial lucas indexing support
new 025e479 OPENNLP-207 Initial lucas indexing support
new 45286a5 OPENNLP-207 Initial lucas indexing support
new 2f70982 OPENNLP-207 Improved lucas integration
new 74b3d69 OPENNLP-235 Fixed mistake in sequence validation
new 896c44f OPENNLP-235 Renamed project from opennlp-caseditor-plugin to
caseditor-opennlp-plugin
new e9bcd69 OPENNLP-249 Now indexes the full article text, and person and
organization annotations.
new 53cc528 OPENNLP-250 Now using prepared statement. All chars are
correctly escaped now.
new 08b2c9e OPENNLP-249 Changed searcher to reflect new mapping file.
new ac293f1 OPENNLP-249 Changed return type of search method.
new 36829e7 OPENNLP-249 Changed return type of search method.
new 23b2d61 OPENNLP-210 Initial check in of task queue interfaces
new ca69a45 OPENNLP-254 Index is now either created or appended.
new 0174dee OPENNLP-251 Added search tool
new f025676 OPENNLP-210 Initial task queue support
new 0330e72 OPENNLP-235 Fixed a bug in the restricted sequence validation
code
new 4cdd129 OPENNLP-252 Created project folder
new 1f12022 OPENNLP-252 Created source folder
new 0b8e527 OPENNLP-252 Copied pom.xml from caseditor opennlp plugin and
updated names
new 9e9eb70 OPENNLP-252 Changed a name
new 8ec349e OPENNLP-235 Added code which selects and reveals the text in
the Annotation Editor of the selected, not confirmed entity.
new 7c21336 OPENNLP-235 Now is forced to detect every confirmed entity
token as an entity. This dramatically boosts the recall of the name finder.
new 8756404 OPENNLP-276 Added test corpus server instance. Thanks to
Tommaso Teofili for provding this class as part of a patch he attached to
OPENNLP - 261.
new ee51940 OPENNLP-261 Added project folder for the
corpus-server-connector
new f950dbd OPENNLP-261 Added empty source folder
new 4d0c083 OPENNLP-261 Added pom.xml file based on pom from
corpus-server-tools
new 0fa0a4d OPENNLP-261 Added empty test source folder
new 899c117 OPENNLP-261 Added empty test source folder
new a41c1eb OPENNLP-261 Added test descriptors. Thanks to Tommaso Teofili
for contributing this.
new d0b66f2 OPENNLP-261 Added collection reader, cas consumer and tests.
Thanks to Tommaso Teofili for contributing this.
new 39ec8b8 OPENNLP-281 First draft of a corpus backup tool
new 81c5bdc OPENNLP-281 Also created a tool to create a task queue
new 155b59e OPENNLP-281 Fixed path handling
new 5ed2738 OPENNLP-277 Fixed bug in id handling, a relative id was used
without re-basing it.
new ef3e240 OPENNLP-277 Removed unused set, which was added for debugging.
new ad37b65 OPENNLP-285 Added fix to load model from configured path
instead from classpath
new 00d97b3 OPENNLP-252 Added generated Activator class
new 13904df OPENNLP-252 Added dummy view, to test that it can be created.
new 50910c3 OPENNLP-252 Updated activator class name.
new 85d7f0e OPENNLP-252 Added corpus server editor input, there a
resource is identified by an URL and cas id
new a8ae2b5 OPENNLP-252 First experimental code to open an Annotation
Editor on a CAS from the Corpus Server.
new 0b23401 OPENNLP-277 Updated Lucas dependency to released version.
new 7f3a638 OPENNLP-252Now it can pull the next CAS from the server.
new 8336a91 OPENNLP-252 Added support to save a CAS back to the server,
based on Cas Editor code.
new a71f959 OPENNLP-252 Updated document provider extension to work with
updated Cas Editor.
new a5aec01 OPENNLP-252 Fixed formating.
new 3f44de9 OPENNLP-252 Fixed view ids.
new f7e4f78 OPENNLP-252 Added equals and hashCode methods.
new debcea8 OPENNLP-252 Settings are now saved in memory
new e79e1ea OPENNLP-252 Fixed equals.
new 5e3b9b6 OPENNLP-252 Updated implementation to work with changed
CasDocumentProvider.
new 965b491 OPENNLP-252 Fixed handling of editor annotation status.
new 1a5592c OPENNLP-252 Now only keeps n items in the history. History
view now contains IEditorInput elements and supports opening via the opening
listener.
new 62cfe02 OPENNLP-235 Stump for the sentence detector view
new 8d449bb OPENNLP-235 Moved settings for token and sentence type config
to OpenNLP pref page.
new 9dd3b3d OPENNLP-235 Moved confirm action to inner class.
new a8eb227 OPENNLP-235 Added sentence detector job.
new 76ddd68 OPENNLP-235 Now sets the preference store.
new 2fcc861 OPENNLP-235 Initial check in.
new 6be5754 OPENNLP-235 Instance variables are now local variables.
new 1920625 OPENNLP-235 Added preference field for paragraph annotation
type
new 4f0cb52 OPENNLP-235 Moved comperator to separate class
new ae55ee9 OPENNLP-235 Moved confirm action to separate class
new 46b99c0 OPENNLP-235 Declared methods as public
new 585077f OPENNLP-235 Moved inner classes to separate files
new 39e7653 OPENNLP-235 Added dummy tokenizer view.
new d569cc5 OPENNLP-235 Removed call to add FS to CAS, because that is
already done by the ICasDocument
new e9ecc3f OPENNLP-235 Improved inputChanged implementation, and added
some comments.
new 2a700fe OPENNLP-235 Added very basic sentence detector support.
new 47c24ac OPENNLP-235 Added wrong field.
new 74b52d6 OPENNLP-235 Pages are now grouped under OpenNLP.
new 8ec1e8a OPENNLP-235 Changed merge logic slightly
new 4e4d3cb OPENNLP-235 Changed merge logic slightly
new 68a4121 OPENNLP-235 Added setter for entity text
new 583d569 OPENNLP-235 Improved entity confirmation listener
new 47153bd OPENNLP-235 Removed unused imports
new 9b24a7b OPENNLP-235 Added dummy for tokenizer preference page.
new 0b9b097 OPENNLP-235 Added edit to entity list selection support based
on Cas Editor code.
new d37d393 OPENNLP-235 Entity List now provides AnnotationFS selection
also.
new 325e636 OPENNLP-235 Hooked up Quick Annotate Action command to the
confirm action, now a user can confirm a potential annotation by pressing Enter.
new 8dca767 OPENNLP-235 Sentence detection can now be restricted to
paragraph annotations.
new 6af2d1c OPENNLP-235 Added key short cut to confirm action.
new 3909d95 OPENNLP-235 Added tokenizer job.
new e6b9774 OPENNLP-235 Added tokenizer constants.
new 0876bf9 OPENNLP-235 Name Finder now supports multiple sentence types.
new c2eba79 OPENNLP-235 Extended the Span object to add a confidence
score.
new 31c9ebe OPENNLP-235 Moved name finder integration code to a separate
class.
new 4d616b3 OPENNLP-235 First support for multiple models.
new f2c4224 No jira, added a comment.
new 6a2af99 OPENNLP-299 Index mapping can now be defined per corpus.
new efe9c15 OPENNLP-299 Index mapping can now be defined per corpus.
new 62c100b OPENNLP-299 Added index mapping file.
new 469a654 OPENNLP-299 Index mapping can now be defined per corpus.
new fe5848b No jira, added comment.
new 6d8b69e OPENNLP-300 Added ability to import a single xmi file or a
folder of xmi files.
new 17b8a43 OPENNLP-299 Fixed handling of index mapping file.
new d046b57 OPENNLP-299 Fixed handling of index mapping file.
new 2e93edd OPENNLP-261 Converted the consumer to read from the Corpus
Server via its RESTful API instead.
new af708e9 OPENNLP-261 Added collection reader xml file which is derived
from Apache UIMAs example FileSystemCollectionReader.xml.
new 1683e39 OPENNLP-261 Adapted stump collection reader xml to work with
CScollectionReader class.
new 7a438d7 OPENNLP-261 Renamed collection reader
new 42c6209 OPENNLP-261 Inserted sample addresses
new b5efcea OPENNLP-261 Added analysis engine xml file which is derived
from Apache UIMAs example RegExAnnotator.xml.
new 672edd1 OPENNLP-261 Renamed CSCasConsumer to CSCasWriter
new 9bd2a95 OPENNLP-261 Adapted descriptor to work with CSCasWriter.
new 544f7d2 OPENNLP-302 Now the generated OSGi MANIFEST.MF is included in
the created jar file.
new ead91a6 OPENNLP-302 Fixed path to MANIFEST.MF file.
new e2ebebf OPENNLP-235 It is now possible to create annotations
depending on the entity type.
new 657ad30 OPENNLP-235 Enabled verified name restriction again.
new 0bc012d OPENNLP-235 Enabled recall boosting for already confirmed
names again.
new 2935b30 OPENNLP-235 Now also uses type to decide if to entities are
equal.
new 926963f OPENNLP-235 Improved matching of types, now also considers
the type.
new 3943759 OPENNLP-299 Added JSON configuration
new 85d3f66 OPENNLP-252 Added a Corpus Explorer View to browse the
contents of a corpus
new e8263b5 OPENNLP-310 Firsts changes to use new preference store. Also
added a bit error handling to avoid exception, but further work is needed.
new 9a1f343 OPENNLP-310 Moved action to show preference dialog to an
separate action. Added action to sentence detector view tool bar.
new 1c14df1 OPENNLP-312 Confirmed entities are now directly removed from
the list of new potential entities.
new 080a72a OPENNLP-310 Removed preferences pages
new ceb9da3 OPENNLP-314 Updated UIMA version to 2.4.0-SNAPSHOT from
2.3.2-SNAPSHOT and changed jar file name.
new 47c5f85 OPENNLP-314 Updated UIMA version to 2.4.0-SNAPSHOT from
2.3.2-SNAPSHOT.
new 8bd79ae OPENNLP-310 Fixed Sentence Detector Preference Page, it was
accidentally showing the Name Finder Preference Page.
new 46dfa33 OPENNLP-310 Removed unused method call to retrieve the
plugins preference store.
new 3f4ddbd OPENNLP-310 Updated content provider to be compatible with
latest Cas Editor changes in UIMA-2245.
new 2d40573 OPENNLP-310 Added call to document provider to save changed
settings!
new d5366df OPENNLP-315 Added log methods.
new 34369d7 OPENNLP-321 Added code to locally store type system
preferences.
new c472b55 OPENNLP-310 Updated to be provide with new session preference
store.
new bb5be7f OPENNLP-253 Created project for new relevance component.
new 240285e OPENNLP-253 Added source folders.
new 348eb99 OPENNLP-253 Initial check in of contribution from Boris
Galitsky. Thanks for contributing.
new a3e1798 OPENNLP-315 Added first error reporting to the name finder
view.
new a7643e9 OPENNLP-315 Added error reporting to the sentence detector
view.
new 50753d2 OPENNLP-312 Improved selection handling after a possible
entity is remved from the list through confirmation.
new 353cf1a OPENNLP-312 Fixed setting of jobs system status.
new f242d0f OPENNLP-303 Added a containing contraint. Class is taken from
OpenNLP UIMA Integration.
new 6ddcc83 OPENNLP-303 Now uses token annotations from input CAS instead
of simple tokenizer.
new 51a412b OPENNLP-315 Now reports an error if sentence annotation is
invalid.
new 18c7772 OPENNLP-315 Now correctly closes input stream during model
loading.
new 05ee426 OPENNLP-310 Added name space to preference keys.
new 5090e53 OPENNLP-310 Changing properties now triggers a name finder
run.
new 24f533a OPENNLP-310 Added an option to enable / disable confirmed
name recall boosting
new b1ac975 OPENNLP-326 Now multiple paragraph types can be configured.
new 44a0b17 OPENNLP-323 Fixed dependencies, and compatibility changed to
work with 1.5.2. Applied patch provided by Boris Galitsky. Thanks for providing
the patch.
new 280b41c OPENNLP-323 Added missing AL 2.0 headers. Thanks to Boris
Galitsky for providing a patch.
new fad10e2 OPENNLP-323 Replaced log4j with java.util.log. Thanks to
Boris Galitsky for providing a patch.
new 0b41238 OPENNLP-330 Fixed junit tests, so they work with opennlp
1.5.2. Thanks to Boris Galitsky for providing a patch.
new 1bd7c1b OPENNLP-330 Fixed junit tests, so they work with opennlp
1.5.2. Thanks to Boris Galitsky for providing a patch.
new c48473e OPENNLP-323 Replaced log4j with java.util.log. Thanks to
Boris Galitsky for providing a patch.
new b46c4c7 OPENNLP-331 Added functions substituting POS taggers by
Parser POS. Thanks to Boris Galitsky for providing a patch.
new 8431406 No jira, added eclipse files to svn:ignore.
new 670c219 OPENNLP-313 Added a button to do tokenization once.
new e0f4d5d OPENNLP-345 Now handles the case correctly where nothing is
selected in the entity list.
new 8190469 OPENNLP-339 If the query parameter is specified the queue
will be created or reseted.
new bfdc73e OPENNLP-311 Initial http model loading support.
new 1240979 OPENNLP-348 Turned open button into an open listener
new 1543d0d OPENNLP-346 Now remembers last used server. Renamed Activator
to CorpusServerPlugin.
new 6f19af7 OPENNLP-347 Replaced text query field with combo query field
which even remembers the last used queries.
new 65527ef OPENNLP-350 Corpus Explorer should not query server in the UI
thread
new dcffa81 OPENNLP-351 Now enter or selection from the query combo
triggers a search.
new cdd1988 OPENNLP-328 Now considers sentences which are already in the
CAS. Sentence detection is triggered automatically.
new f6ff3b2 OPENNLP-353 New label provider shows begin and end of a
sentence.
new 1b60558 OPENNLP-354 Specified timeout and fixed error handling.
new 28854d9 No jira, added suppress warnings for raw types
new 36a49c3 OPENNLP-356 Now uses cas editor listener to listen for input
changes. If input is changed view is refreshed.
new 5467a41 OPENNLP-356 Updated to work with the latest version of the
editor listener.
new 1646bf8 OPENNLP-356 Updated change listener, to use common base class.
new c452da2 No jira, added suppress warnings.
new 4c0343e OPENNLP-356 Fixed confirm action to always use current
document.
new 9a5f3e6 OPENNLP-337 Moved Porter Stemmer to opennlp.tools.stemmer
package. Thanks to Boris Galitsky for providing a patch.
new bbebdc4 OPENNLP-356 Now uses new CasEditorView as a base which
handles input changes by re-creation of the view page.
new 16fcd30 OPENNLP-392 Now maintains the selection after confirmation.
new 2c2b434 OPENNLP-387 Demonstration on how similarity component
improves search accuracy. Thanks to Boris Galitsky for providing a patch.
new 31670cd OPENNLP-387 Added missing AL header, and formated code.
new 1c4ede5 OPENNLP-392 Now maintains the selection after confirmation.
new e6b35e8 OPENNLP-392 Improved selection handling after one was skipped.
new 550e9db OPENNLP-401 Selection is only changed if the user confirms an
annotation from the name finder view.
new 4db48cf OPENNLP-408 Now only updates selection when sentence detector
view is active.
new d95b777 OPENNLP-405 Now ordering also depends on confidence.
new f1a265b OPENNLP-410 Refactored Entity class into PotentialAnnotation
and removed old Entity left overs.
new 90b7004 OPENNLP-410 Added javadoc comment.
new c78818d No jira, changed OpenNLP version to 1.5.2.
new 04d3744 OPENNLP-319 Added new type list field editor
new 328e935 OPENNLP-319 Minor layout improvements.
new 6cb4f21 OPENNLP-319 Added new field editor for UIMA types.
new b6024c1 OPENNLP-319 Added two new field editors to name finder
preference page to make configuration easier.
new e053af6 OPENNLP-324 Added configuration options and added initial
capital letter filter.
new fd7c143 Creating folder for new OpenNLP ML rewrite.
new 79692b2 OPENNLP-39 Branched opennlp-maxent for opennlp-ml
new 1c7bbb7 OPENNLP-39 Deleted incorrectly copied opennlp-maxent branch.
new ff93001 OPENNLP-39 Branched opennlp-maxent for opennlp-ml
new 558e359 OPENNLP-119 Moved classes to new location in new project
structure
new db43e73 OPENNLP-119 Fixed package declarations.
new bf93714 Updated test code to have org.apache.ml package prefix.
Updated POM. OPENNLP-416
new fe66ee6 OPENNLP-319 Now correctly used Modify event instead of
Selection event.
new ff11418 No jira, added missing AL header.
new 6f2a1a2 repaired appropriateness criterion for sentence inclusion
into generated content
new d4ff792 demonstration how sensitive syntactic match is compared to
bag-of-words approach Key: OPENNLP-413
new 5fc74ef OPENNLP-414.txt
new cbc2068 OPENNLP-277 Index is now reused and re-opened on index
change. Analyzer was changed from whitespace to standard.
new 17313bd OPENNLP-419 readme.txt + more code comments for similarity
component
new 81bcd7e Move opennlp to tlp as per INFRA-4456
new a95aaf7 OPENNLP-411 Updated UIMA dependency to 2.4.0.
new bee2db2 OPENNLP-411 Updated UIMA dependency to 2.4.0.
new 39198fc No jira, added missing mention of mapping file to help
message.
new 5495a91 OPENNLP-457 Type System is now resolved before it is send to
the Corpus Server.
new 3416312 No jira, fixed formating.
new fc41455 OPENNLP-458 Now checks that corpus exists instead of just
assuming that it exists.
new f318026 OPENNLP-458 Removed left over debug message.
new 3ecbacf OPENNLP-458 getCorpus now return null if corpus does not
exist.
new 89b8cfc OPENNLP-459 Now searcher is correctly chosen for corpus.
new 56672af OPENNLP-459 Searcher is now closed on shutdown!
new ee18433 No jira, removed unused imports.
new b0bb0eb No jira, added wtp eclipse config file to svn ignore.
new 28cedc6 OPENNLP-460 Updated to fit implementation.
new 6882a66 OPENNLP-461 Fixed a bug in query remembering code.
new f962a49 OPENNLP-462 Added support to exclude annotation types from
intersecting with recommended sentences. Existing sentences are now handled via
the new exclude logic.
new eddbf23 OPENNLP-462 Fixed NPE when no exclusion types are specified.
new 5f8a96d OPENNLP-464 Model loading error is now reported to the user.
new 2a87904 OPENNLP-465 Now it is triggered by preferences changes as
well.
new 990215d OPENNLP-462 Fixed code to exclude sentences.
new 9b453bb OPENNLP-467 Now logs creation of a queue.
new 35e94cb OPENNLP-468 Added lowercase filter to better work with
standard analyzer
new 615a267 OPENNLP-468 Created a constant for the id field.
new 8af4465 No jira, fixed formating.
new 2b87455 OPENNLP-340 Added support to remove a CAS from a corpus.
new b5bd305 OPENNLP-475 Changed CorpusSever.getTypeSystem to return a
byte array.
new cebd3d8 OPENNLP-472 Created new project for corpus-server
implementation.
new b10a4ec OPENNLP-472 Created folder structure.
new b854621 OPENNLP-472 Copying impl classes from corpus-server project.
new 40c67ce OPENNLP-472 Copying impl classes from corpus-server project.
new 03d6585 OPENNLP-472 Fixed package declaration.
new 84ad141 OPENNLP-472 Added pom.xml and Activator class. Fixed imports
in copied implementation classes.
new cdd4e70 OPENNLP-472 Moved implementation over to corpus-server-impl
and changed build to produce an OSGi bundle.
new ca10b92 OPENNLP-472 Added moved resources
new 0ae521a OPENNLP-476 Added OSGi classes
new 14b8d3d OPENNLP-480 Created tagging server project folder in sandbox.
new 6493ee1 OPENNLP-480 Created initial structure for tagging server.
new 0e04d71 OPENNLP-476 Added OSGi bundle and the application for the
rest servlet
new 61c0f91 OPENNLP-420 to speed up similarity computation, store parsing
results in a hash, so that if a sentence has been parsed, chunked and prepared
for matching once, we store it in a hash. when the Processor is instantiated,
hash is deserialized. When the processor is closed, this hash is serialized.
new f0e1f0f OPENNLP-420: cached parsing results for junits *.dat file
new 4390c27 OPENNLP-419 write a doc which will introduce potential users
to the Component
new 39e8132 OPENNLP-436 Auto Taxonomy Learner for Search Relevance
Improvement based on Similarity
new 149bb28 resources for OPENNLP-436 Auto Taxonomy Learner for Search
Relevance Improvement based on Similarity
new b87df8e test for OPENNLP-436 Auto Taxonomy Learner for Search
Relevance Improvement based on Similarity
new 48d89f2 OPENNLP-489 Now always uses end marker which comes first in
article. Thanks to Prokopis Prokopidis for providing a patch.
new eecf0ac OPENNLP-420: cached parsing results for junits *.dat file now
caches into CSV file instead of java object serialization
new b1ad93d OPENNLP-420: cached parsing results for junits *.dat file now
caches into CSV file instead of java object serialization added cache in CSV
format: sentence_parseObject.CSV
new 20d048b formatting fixed by applying template OPENNLP-419 write a doc
which will introduce potential users to the Component
new ec9aa61 OpenNLP OPENNLP-497 create maven script, release notes
new 1747178 OpenNLP OPENNLP-497 create maven script, release notes
new d692dce OPENNLP-480 First draft of POS Tagging service.
new e073aa2 OPENNLP-480 First draft of Name Finder service.
new ff786e1 OPENNLP-480 Added sample model bundle and feature xml for
easier installation.
new f58dbc4 OPENNLP-476 Added features.xml files to ease the installtion
into an OSGi Runtime such as Apache Karaf.
new fe50c98 No jira, added argument validation.
new 5a80b5a OPENNLP-513 Added support to drop a corpus
new 8f2ee87 OPENNLP-472 Adjusted packages imports to work with OSGi
new a1c8635 OPENNLP-518 Models can now be loaded from file URLs as well.
new a96dac5 OPENNLP-480 Added initial support for tokenizer and sentence
detector, updated name finder and pos tagger.
new dc3b53a OPENNLP-480 Added initial support for tokenizer and sentence
detector, updated name finder and pos tagger.
new 3c6e8b2 OPENNLP-480 Updated and extended sample configuration.
new e3a22b8
new eb2a5f8 OPENNLP-480 Added first draft of simple web demo.
new 68c3bfb OPENNLP-480 Fixed CSS problems.
new 6237da8 OPENNLP-480 Fixed bug in offset handling.
new 46f4e5c No jira, fixed formating.
new 12a5e2d OPENNLP-528 Added method to replace the type system of a
corpus.
new 15287ae OPENNLP-528 Added method to replace the type system of a
corpus.
new 9b2979c OPENNLP-528 Added support to resolve the replaced type system.
new 72302ac OPENNLP-531 Output directory must now be passed in as an
argument.
new f5a5143 OPENNLP-532 Added start script.
new 2801904 No jira, added annotation status feature structure.
new e5615a7 OPENNLP-532 Added start script.
new b71c207 OPENNLP-532 Renamed script.
new 1cd3d45 OPENNLP-261 Implemented CAS write support.
new 54f8ff6 OPENNLP-261 Added sample to perform sentence detection and
tokenization via a CPE.
new 5ba9519 OPENNLP-261 Added sample to train a person name finder.
new d7eef46 OPENNLP-537: make an access to generic search engines to
demonstrate search results re-ranking
new 2ce90b3 OPENNLP-538 Another illustration for similarity component:
converting natural language task into Java code
new ea5e96b OPENNLP-497 Fixed maven script to build distributions
new 82ec363 OPENNLP-497 Fixed maven script to build distributions
Re-built cache for runs w/o models
new 0d914c2 OPENNLP-497 Fixed maven script to build distributions better
handling of cases where models are unavailable
new 57b19b5 OPENNLP-540 SOLR request handler for search results
re-ranking based on 'Similarity'
new 7501fd8 OPENNLP-497 Fixed maven script to build distributions updated
parsing cache for junits
new 01f2232 OPENNLP-497 Fixed maven script to build distributions added
caching for search engine api calls
new 0cd102a OPENNLP-497 Fixed maven script to build distributions updated
thresh for tests
new 7bfed99 OPENNLP-497 Fixed maven script to build distributions only
use cached external web search engine results
new bab831a copied from Apache Tika project
new 2c4184e OPENNLP-497 Fixed maven script to build distributions fixed
[WARNING]
/opennlp-similarity/src/main/java/opennlp/tools/nl2code/NL2ObjCreateAssign.java:210:
warning: unmappable character for encoding UTF-8
new 4ae829c OPENNLP-497 Fixed maven script to build distributions
assembly.xml is copied from Tika
new be734cd OPENNLP-497 Fixed maven script to build distributions
assembly plugin is added
new 55ab905 OPENNLP-497 Fixed maven script to build distributions moved
license & notice files to the root
new fe4c1a2 OPENNLP-497 Fixed maven script to build distributions fixed
license & notice
new 1dfeaa3 OPENNLP-497 Fixed maven script to build distributions fixed
license & notice
new c518d24 <?xml version="1.0" encoding="UTF-8"?>
new aa6d29c <?xml version="1.0" encoding="UTF-8"?>
new 928cc2e pom.xml
new f6492bc pom.xml
new 7512e44
src/main/java/opennlp/tools/similarity/apps/WebSearchEngineResultsScraper.java
new b7952a1 [maven-release-plugin] prepare release
opennlp-similarity-0.0.1
new 3746842 bing api
new 34b5e71 OPENNLP-555 Now throws a Core Exception if document couldn't
be saved.
new 733e4a3 opennlp-548 bing api
new 48d6812 opennlp-548 bing api
new 409673f OPENNLP-575 Created the new coref project folder
new a6a17c1 OPENNLP-575 Created src folder and opennlp.tools package.
new c76719b OPENNLP-575 Copied coref component main code over to sandbox
project.
new 2b3ebb6 OPENNLP-575 Added cmdline package
new ca4db4f OPENNLP-575 Copied cmd line tools over to opennlp-coref
new bbde4e3 OPENNLP-575 Added initial pom file to build to coref
component.
new 3d70692 OPENNLP-575 Created lang folder.
new 8c84ec8 OPENNLP-575 Copied the old englisch coref command line tools
new 9375e8c OPENNLP-575 Created formats folder.
new 0358d6b
new adf992f OPENNLP-575 Copied coref over to sandbox.
new 5f18ee8 OPENNLP-575 Copied coref over to sandbox.
new 8c814c8 OPENNLP-585 Added a Brat NER tagging service.
new 95089e6 OPENNLP-585 Added a Brat NER tagging service.
new 698a60a Prototype of a tool to allow users to create models from of
a set of known entities based on their own data in the form of sentences. See
the Example class in the .v2 package.
new 25d225d Prototype of a tool to allow users to create models from of
a set of known entities based on their own data in the form of sentences. See
the Example class in the .v2 package.
new 55c5a17 Changed to use interface sig rather than impl
new 6ea8536 OPENNLP-607 Changed header template. Referenced new jira
ticket
new 961dab4 OPENNLP-611 POM with 1.7 build tags.
new 543b97a OPENNLP-614 Moved all GeoEntityLinker impl classes to
sandbox. Called this module addons as a place to consolidate useful addons to
the base opennlp modules.
new eb46984 OPENNLP-614 Moved all GeoEntityLinker impl classes to
sandbox. Called this module addons as a place to consolidate useful addons to
the base opennlp modules.
new c570240 OPENNLP-615 Added a scoring impl that utilizes a doccat model
to help with toponym resolution. The ModelBasedScorer also contains two static
methods for training the model based on the CountryContext information used by
the GeoEntityLinker.
new 8ed861d OPENNLP-615 Cleaned up javadocs and header info in
ModelBasedScorer
new 17332b2 OPENNLP-579 Added a SetupUtils class so users can get the
Lucene indexes and Country Doccat models built very easily. Also many other
small efficiencies.
new 4b726f5 OPENNLP-579 Fixed a bug in the GazateerIndexer. Refined the
SetupUtils.
new a682acf OPENNLP-579 Added simple caching to improve performance.
new ea98ffc OPENNLP-607 Cleaned up comments and fixed a bug that was
giving the output model the wrong type in some cases
new 822b6bc OPENNLP-621 Fixed errors and changed all approprate imports
to opennlp.tools.ml. Builds but no testing done yet.
new f838a60 OPENNLP-614 Fixed a bug in the GeoEntityLinker. No gaz lookup
was being performed if no country context was found.
new 9c65f04 OPENNLP-607 Fixed many issues. Added default file-based impls
for all interfaces, and created a util class wrapper to allow for easy use of
the default implementations.
new eea1e78 OPENNLP-607 Fixed many issues. Added default file-based impls
for all interfaces, and created a util class wrapper to allow for easy use of
the default implementations.
new b866ac1 OPENNLP-626 Integrated Arabic, Russian, Thai, and Farsi
analyzer usage to GazateerIndexer. Still need to add support for query time
analyzer usage via a language code overload or language detector...
new 3bc5f31 OPENNLP-628
new 747ff23 OPENNLP-626 renamed packages for consistency in addons, also
made small efficiencies
new 574a936 OPENNLP-626 renamed packages for consistency in addons, also
made small efficiencies
new 26e7d94 OPENNLP-607 renamed packages for consistency in addons, also
made the framework generic with file based implementations
new e66257d OPENNLP-607 renamed packages for consistency in addons, also
made the framework generic with file based implementations
new 8ddfe01 renamed directory
new b2d22cb renamed directory
new a13ab82 OPENNLP-574 Moved from addons to sandbox to mature there.
new a7a34fd OPENNLP-636 Updated Trainer constructor usage. Init
parameters are now passed in via the init method and not via the constructor.
new 8747699 OPENNLP-661 The OpenNLP machine learning code was integrated
into the openlp-tools project. The opennlp-ml project should be removed from
the sandbox.
new af92a7d OPENNLP-657 Initial pull of the nlp-utils provided by Tommaso
Teofili. Thanks for contributing.
new 5daae29 OPENNLP-666 Support for strict CFGs non terminal rules
expansion. Thanks to Tommaso Teofili for providing a patch.
new 627d985 OPENNLP-666 Added RuleTest. Thanks to Tommaso Teofili for
providing a patch.
new 531ccb8 OPENNLP-713 - fixed some javadocs, using generics in ngrams
utils, added more tests to cfg and language modeling packages
new 20ad591 OPENNLP-723 - pcfg support in sandbox (nlp-utils)
new 113a657 OPENNLP-723 - fixed cky method, minor fixes to formatting
new 8f6c185 OPENNLP-752 Added the summarizer contribution. Thanks to Ram
Soma for contributing it.
new 11f9145 Added initial version of the wsd component. Thanks to Anthony
Beylerian and Mondher Bouazizi for the contribution.
new c5d321f OPENNLP-758 Added a pom to make it build with maven
new c717f6b OPENNLP-758 Formatted the code according to OpenNLP code
conventions
new b276f56 OPENNLP-757 Added some headers and fixed some issues. Thanks
to Mondher Bouazizi for providing a patch.
new 2b2d892 OPENNLP-758 Applied clean up patch. Thanks to Anthony
Beylerian for providing a patch.
new 7b94e5f No jira, set eol-style property to native.
new 3f541e2 OPENNLP-757 Applying bulk patch. Thanks to Mondher Bouazizi
for providing a patch!
new 500915b OPENNLP-790 First iteration of the evaluator, testing on
basic lesk, will need to validate and check the different performances. Thanks
to Anthony Beylerian for providing a patch.
new 359c3a5 OPENNLP-790 Removed unused variables. Changed the output
format to : [Source SenseKey Score] each WSDisambiguator is assumed to have at
least [Source SenseKey] as output for each disambiguation. In the case of Lesk
and other unsupervised approaches with scores, the score can be provided as
extra output. For now only the highest scoring disambiguated sense is
considered in evaluation.
new ffafc92 OPENNLP-790 - Fix for the IMS approach to Support Semsor3.0
data - The output format is now [Source SenseKey] so it corresponds to that of
Lesk. - Removed some unused variables. - Added Some parameters to let the user
select the source of data he wants to use. - Implemented the IMS Evaluator. -
Added and clarified some parts of the documentation.
new 77f56ce OPENNLP-802 The WSDisambiguator needs a baseline to compare
the implemented approaches with. Lesk presents a good baseline, however
Senseval and Semeval workshops demonstrated that MFS presents a better and more
challenging baseline.
new ce14617 OPENNLP-758 Updated Lesk with new data readers and added MFS
in case no overlaps are found (similar to the simplified version). Thanks to
Anthony Beylerian for providing a patch.
new 4b4bd99 OPENNLP-791 Reads the mentioned clustering files, could also
switch to objectstream. Thanks to Anthony Beylerian for providing a patch.
new 9d4b861 OPENNLP-802
new da26bd6 OPENNLP-758 fixes for parameters
new d8abd31 OPENNLP-804 Updated opennlp-tools from 1.6.0-SNAPSHOT to
1.6.0.
new ff5d685 OPENNLP-801
new 689952b OPENNLP-794
new 329b0df OPENNLP-801 Also includes some more cleanups. Thanks to
Anthony Beylerian for providing a patch!
new 729117f OPENNLP-801 1- IMS now no longer does the pre-processing
steps (The user will have to introduce them). Thanks to Mondher Bouazizi for
providing a patch!
new 6637500 OPENNLP-807 We have worked on the integration of the existing
approaches.
new 61943f6 OPENNLP-796 The two readers now return
ObjectStream<WSDSample>. Thanks to Mondher Bouazizi for providing a patch.
new afda4bf Removed classes marked for removal
new 600c541 Removed classes marked for removal
new d8aa2a1 Added missing commons lang dependency
new 806c27d Fixed code formatting
new 5580818 Commented junit Assert call to make it compile with maven
new d81166b OPENNLP-791 WordNet based clusters patch, uses ME for now
will have to modify for other classifiers. Thanks to Anthony Beylerian for
providing a patch!
new 172e891 OPENNLP-792 Added class javadoc. Thanks to Anthony Beylerian
for providing a patch.
new 98eb743 OPENNLP-713 - slightly enhanced some tests
new 74d93c5 OPENNLP-713 - slightly enhanced some tests, made Hypothesis
unmutable
new 93ddafb OPENNLP-713 - pcfg#toString should result in same parser CLI
output
new fbbf803 OPENNLP-817 - added a CFG runner (with samples), added pcfg
parse rules / cfg capabilities
new 8faad08 OPENNLP-817 - switch to j7, added missing AL header, added
runner test, tweaked parse rules method to adjust probs
new c0197a5 The geocoder was moved to the addons area quite some time back
new 1bb45db OPENNLP-821 Moved mallet addon from my github repository to
here
new ce88c13 OPENNLP-821 Now builds and runs with 1.6.0
new 092bff6 added unit tests, corrected some mistakes, need more unit
tests
new 77c5552 removed useless classes/folder
new d167ee7 fixed method name
new 9c1a75c moved MFS and Lesk into main package moved IMS and OSCC into
main package as contextGenerators
new 6ce823b
new 1b75157 updated tests
new 0498fe3 OPENNLP-850 Add ner brat annotation service
new 206ef95 OPENNLP-850 Update dependencies to work with the uber jar
new 4e121df OPENNLP-850 Fix type in tokenizer init error message
new 7009f23 OPENNLP-843 - moved contextgen implementations to top dir,
need to make a common model and params for supervised approaches
new 0f08de2 OPENNLP-843 - grouped the two supervised techniques into a
common one with different context generators, the default context generator is
from the IMS approach, updated the unit tests, need to remove the useless
classes.
new f40736d OPENNLP-843 - removed the unnecessary files
new bf255a3 OPENNLP-827 fix for evaluator to check for non empty
instances from senseval data
new 552afea OPENNLP-864 Rename name finder annotator classes
new 67e5ed4 OPENNLP-866 Add optional argument for server port
new dce84c0 OPENNLP-860 Add .gitignore file
new 4350f64 Move brat annotator to opennlp.git
new ad4195b Whitespace test commit
new 9aa270c merge from bgalitsky's own git repo
new 1f97041 merge from bgalitsky's own git repo
new 2707f66 removed stanford nlp refs
new 96c088b Add first draft of dl name finder
new a63ec16 OPENNLP-1009 - added initial RNN and StackedRNN impls from
Yay lab, minor fixes
new 6bfb15f OPENNLP-1009 - minor improvements / fixes
new fe2b1d9 fixed adagrad update for (s)rnn, added rmsprop to srnn
new 6f0659f removed useless state update, minor fixes
new a80f29b text sequence classification using Glove and RNN/LSTMs
new a1c8692 OPENNLP-1106: Make it compile with 1.6.0, update java to 8
and checkstyle fixes
new 7f2076c Refactored and implemented DocCat API
new e5c4676 Removed test CLI parameters for Main method
new cba153e OPENNLP-1111: Adding initial EC2 scripts for testing.
new 7c6bb48 Merge pull request #3 from thammegowda/glove-rnn-classifier
new a0fa9d0 OPENNLP-1111: Improving the CloudFormation template for
OpenNLP testing on AWS.
new 9c78236 OPENNLP-1111: Making tests on EC2 automated.
new f764dbd OPENNLP-1009 - minor updates to (s)rnn parameters, rnn now
using rmsprop
new 8e234db OPENNLP-1009 - wrong test file
new be049da Update DL4J/ND4J to 0.9.1
new c7fcaa3 OPENNLP-1009 - less epochs for (s)RNNs tests
new 84bf608 OPENNLP-1009 - added NeuralDocCatTest, currently fails at
loading model
new 4bde702 OPENNLP-1009 - switch to opennlp-tools 1.8.3 release
new a08a73e added tensorflow NER prediction PoC
new 87a75a7 Merge pull request #10 from thygesen/tfnerpoc
new 23c0ebb added files for test
new c0a14f6 Merge pull request #11 from thygesen/tfnerpoc
new 9a14494 Add TF training code for name finder
new efc1051 Replace hard coded paths with args
new 23c95bd Add AL 2.0 header to Java source files
new 60eb80e Remove incorrectly placed space in tag name
new 788e73a Map chars to indices 0..n instead of using ord(c)
new 6294dfa Write mapping dicts to disk
new f8db193 Name placeholders and variables for use from Java API
new 12b2c65 Fix loading of dicts by removing GZIP decompressor
new bd30e1a Adjust operation names to namefinder.py
new 0005382 Write model to disk after training
new 2abc214 Write correct dict into char_dict.txt
new e483e9f Adjust encoding to match BioCodec (Java)
new 19d046d Implement the TokenNameFinder interface
new 77a39ad Disable dropout for inference
new a1899bb Add missing return parameter to fix compile error
new 5df185c Rename module to match folder name
new c126158 Rename packages to org.apache.opennlp.namefinder
new 8dc9495 added vector size to NameFinder + only save model if improved
+ stop training if not improved for 5 iteration
new c136c85 Adjust settings to match namefinder.py trainer
new 5e9fc9b Add constructor to load all resources from Input Streams
new faee815 Call close on Tensor objects to release memory
new 8f24fc6 Add first version of namecat poc
new 5e401cc Move namefinder.py to namefinder folder
new 04da946 Add Java API for namecat and more Randomize training data,
add dropout, add test eval
new f5f9377 Add split.py to split training data into pieces
new 7f33b3f Compute ntags based on label dict size
new cb36083 Extract vector size from embeddings file
new 8b09a57 OPENNLP-1009 - upgrade to dl4j 1.0.0-beta2
new fd42cb4 Merge pull request #20 from tteofili/OPENNLP-1009a
new 6f38bee Add first draft of normalizer trainer
new 30067a7 Add first draft of normalizer Java API
new 9c02da7 Make batch size for normalizer inference dynamic
new d187edf Remove end marker from output seq
new a850904 Remove hard coded seq length
new 00a8fdf Write model and dictionaries into zip package
new f746c57 Add train dropout to normalizer
new 5e8d0da Name Finder Trainer now writes zip package with vocab files
inside
new fa9de88 Namecat Trainer now writes zip package with vocab files inside
new 6804801 Fix error when empty String array is passed in
new 2c0121d Add a script to generate date normalization data
new 92a30e8 Add year only dates to date generator
new af594df Add char dropout to handle unknown chars to normalizer
new 199f756 Replace hard coded train, dev and test file names with args
new bfd38be Use STX and ETX for start end seq chars in normalizer
The 506 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.