Hello,

I've been working with the Doccat module and I am wondering if we could
improve its data structure for the 1.6.0 release.

Today the DocumentSample has the following attributes:

- String category
- List<String> text

I would suggest adding an attribute to hold metadata, or additional
contexts information. What do you think?

Also, what do you think of including sentences and paragraph information? I
don't know if there is anything a feature generator can extract from it to
improve the classification.

Thank you,
William

Reply via email to