Github user kaspersorensen commented on the pull request:
https://github.com/apache/metamodel/pull/17#issuecomment-104927693
My worry is still there I am afraid. With the change in CsvDataSet there is
now a call to the cast(...) method which casts/narrows the object type on a
value-by-value basis. That means that I can very easily be getting values of
different types within the same column.
In my opinion it would be much better if we during the analysis of the data
set a ColumnType on the Column definition. And then we _only_ do the casting as
per the column type. That way we ensure that it is "always or never" that we
change the objects we read.
Another aspect that I'd like to request is that this feature can be turned
on or off in CsvConfiguration. For the application that I am working on CSV is
one of the most important data source types and we have some cases of
processing billions of CSV records. If there is just a single "foo bar" value
in the middle of such a file with otherwise entirely numbers or so, then we
need a way to turn off the casting so that the consumer of the data at least
does not get inconsistent value types (or loss of data in case the cast would
omit the "foo bar" value).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---