csv column type setting

tcheran Mon, 26 Dec 2022 05:00:15 -0800

Hi, I was looking for something simple (no macros), relying just on Nim's 
standard libraries, to dynamically infer column type and assign it to the 
proper seq type. I know that there are pretty advanced packages like 
`datamancer` out there, but I wish to build my own small toy, just to better 
understand the proper way to do it (more interested on the concept rather than 
the performance). I know that Nim is strictly typed and that it encourages the 
programmer to model its own types. I devised something like that:
    
    
    import strutils, parsecsv, sequtils, streams
    type
      nimSeriesKind = enum
        seriesInt,
        seriesFloat,
        seriesString
      
      nimSeries = ref object
        case kind: nimSeriesKind
        of seriesInt: intValues: seq[int]
        of seriesFloat: floatValues: seq[float]
        of seriesString: stringValues: seq[string]
    
    # CSV file parsing and matrix inversion omitted
    # after that, I have a seq of string columns that
    # I wish to check if they can be evaluated as int or float
    
    proc setSeries(x: seq[string]): nimSeries =
      try:
        result = nimSeries(kind: seriesInt, intValues: x.map(parseInt))
      except:
        try:
          result = nimSeries(kind: seriesFloat, floatValues: x.map(parseFloat))
        except:
          result = nimSeries(kind: seriesString, stringValues: x)
    
    
    Run


Is the object variants the right way to do it? The try/except approach is 
possibly a bit inelegant (any relevant drawbacks?), maybe there is a more 
idiomatic / efficient way to go for? Thank you.

P.S. It's evident that if the last rows forces the type change (e.g. all 
integers, but last row is NaN) it will determine complete sequences 
re-mapping... but hopefully typical datasets do not conspire for the worst 
case.

csv column type setting

Reply via email to