On 03/28/2012 11:02 PM, Aliaksandr Autayeu wrote:
One small note on "b) improve cacheing. Now it is implemented via java
object serialization; make it via CSV files".
If you'll use some library for CSV, you might as well think about Google
Protocol Buffers. They are pretty fast.

The code reads test data from this file and this is only done during the
unit tests. Using object serialization doesn't really work because it
depends on the VM version.

The best solution would be to just read in Parse trees, but this is currently
not possible because the file contains a Parse and shallow parse.
To fix that the current alignment code would need to output a Parse object again. Another advantage of just having a Parse object is, that it makes the interface to the
similarity component simpler.

Anyway, everyone who is interested in the similarity component should have a look
at the documentation Boris created.

Any comments and suggestions are very welcome!

Jörn

Reply via email to