Github user helenahm commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/93#discussion_r130533347
--- Diff: core/pom.xml ---
@@ -103,6 +103,12 @@
<version>${guava.version}</version>
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>opennlp</groupId>
+ <artifactId>maxent</artifactId>
+ <version>3.0.0</version>
--- End diff --
In general I totally agree. I think it would be good to perform the move to
another version of maxent in a few steps.
1. The code I have re-used is that of GISTrainer. That is more or less
updating the weights in a matrix where matrix is hivemall's matrix. Everything
else is just following your class structure. I have checked that the resulting
models are the same and I have also confirmed that the resulting model makes
sense on my own data. So the resulting weights must be correct. Can we say that
training is correct and accept the current version as the correct and
functioning one?
2. After that there are a few options:
we could try to re-write the code in a way that will accept the newest
version of opennlp maxent and all the following versions. I guess that would
require changes in opennlp maxent too, but perhaps it is better than manual
alteration of GISTrainer every time you update something, and both projects
will benefit from such collaboration.
if not, perhaps for Hivemall as a project, we may consider re-writing
iterative scaling from scratch to make it Hivemall efficient, perhaps using the
tricks OpenNLP uses to make the code more efficient, and making sure that the
resulting weights are comparable, but without aiming to being able to plug a
new OpenNLP jar each time new version appears.
What do you think?
Regards,
Elena.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---