On 01/29/2014 07:33 PM, Richard Eckart de Castilho wrote:
If I understand the SequenceClassificationModel interface correctly,
the input data to be classified is passed as an array T[].
What about data that is very large? I think it would be nice if
the new interface would support sequence classifications on streams,
e.g. by passing an Iterator<T> or an actual stream to the classifier.
Exactly, the sequence to be classified is passed in as an array.
The current interface will support passing in quite long sequences (only
limited by memory)
of probably easily a few ten thousand elements.
Do you have a use case where this would not be good enough? In the
OpenNLP components the
sequences are usually a sentence, but even a really long document should
work.
Supporting streams would add quite some complexity, and I wonder if it
is really necessary,
e.g. looking back or forward in the sequence during feature generation.
Thanks,
Jörn