Aleksey Zinoviev created IGNITE-7328:
----------------------------------------
Summary: Improve Labeled Dataset loading from txt file
Key: IGNITE-7328
URL: https://issues.apache.org/jira/browse/IGNITE-7328
Project: Ignite
Issue Type: New Feature
Components: ml
Reporter: Aleksey Zinoviev
Assignee: Aleksey Zinoviev
1. Wouldn't it be better to parse rows in-place (not to save them as strings at
first)? In current implementation we will be needed to keep a dataset in memory
twice and it might be a problem for big datasets.
2. What about the case when a dataset contains not only a numerical data? Do we
consider this case or for such purposes some other "DatasetLoader" will be used?
3. Just an idea, in case we don't want to fall on bad data (99% of cases) would
be great to understand the quality of loaded dataset such as number of missed
rows/values.
4. Does a situation when a row doesn't contain required number of columns
should be considered as "bad data" and don't break parsing with
IndexOutOfBoundException?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)