Github user kaspersorensen commented on the pull request:

    https://github.com/apache/metamodel/pull/17#issuecomment-104927693
  
    My worry is still there I am afraid. With the change in CsvDataSet there is 
now a call to the cast(...) method which casts/narrows the object type on a 
value-by-value basis. That means that I can very easily be getting values of 
different types within the same column.
    
    In my opinion it would be much better if we during the analysis of the data 
set a ColumnType on the Column definition. And then we _only_ do the casting as 
per the column type. That way we ensure that it is "always or never" that we 
change the objects we read.
    
    Another aspect that I'd like to request is that this feature can be turned 
on or off in CsvConfiguration. For the application that I am working on CSV is 
one of the most important data source types and we have some cases of 
processing billions of CSV records. If there is just a single "foo bar" value 
in the middle of such a file with otherwise entirely numbers or so, then we 
need a way to turn off the casting so that the consumer of the data at least 
does not get inconsistent value types (or loss of data in case the cast would 
omit the "foo bar" value).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to