[
https://issues.apache.org/jira/browse/IGNITE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Zinoviev updated IGNITE-7328:
-------------------------------------
Priority: Trivial (was: Minor)
> Improve Labeled Dataset loading from txt file
> ---------------------------------------------
>
> Key: IGNITE-7328
> URL: https://issues.apache.org/jira/browse/IGNITE-7328
> Project: Ignite
> Issue Type: New Feature
> Components: ml
> Reporter: Aleksey Zinoviev
> Assignee: Aleksey Zinoviev
> Priority: Trivial
>
> 1. Wouldn't it be better to parse rows in-place (not to save them as strings
> at first)? In current implementation we will be needed to keep a dataset in
> memory twice and it might be a problem for big datasets.
> 2. What about the case when a dataset contains not only a numerical data? Do
> we consider this case or for such purposes some other "DatasetLoader" will be
> used?
> 3. Just an idea, in case we don't want to fall on bad data (99% of cases)
> would be great to understand the quality of loaded dataset such as number of
> missed rows/values.
> 4. Does a situation when a row doesn't contain required number of columns
> should be considered as "bad data" and don't break parsing with
> IndexOutOfBoundException?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)