Hello, I've been working with the Doccat module and I am wondering if we could improve its data structure for the 1.6.0 release.
Today the DocumentSample has the following attributes: - String category - List<String> text I would suggest adding an attribute to hold metadata, or additional contexts information. What do you think? Also, what do you think of including sentences and paragraph information? I don't know if there is anything a feature generator can extract from it to improve the classification. Thank you, William