The error model for Microsoft's Midori project (2009-2013) <http://joeduffyblog.com/2016/02/07/the-error-model/> is a Nice Link
On Friday, November 4, 2016 at 5:45:39 AM UTC-4, Tamas Papp wrote: > > I figured it out (posting the solution for the archives, and possibly > for comments). Reading the Julia issues about exceptions, I came across > a blog post about the Midori error model [1], and also some discussions > on how exceptions are not the way to handle errors which are not > bugs. So I realized I need a version of parse that returns a > Nullable, then found it that it already exists (tryparse). > > So here is my solution (for the self-contained stylized example, the > actual code is much more complex): > > parsefield{T <: Real}(::Type{T}, string) = tryparse(T, string) > > function parsefile(io, schema) > line = 1 > while !eof(io) > strings = split(chomp(readline(io)), ';') > values = parsefield.(schema, strings) > function checked(column, value) > if isnull(value) > error("could not parse \"$(strings[column])\" as " * > "$(schema[column]) in line $(line), column > $(column)") > else > value > end > end > # do something with this > [checked(column,value) for (column, value) in enumerate(values)] > line += 1 > end > end > > test_file = """ > 1;2;3 > 4;5;6 > 7;error;9 > """ > > parsefile(IOBuffer(test_file), fill(Int, 3)) > > I still need to figure out type stability etc, but I think I am on the > right track. > > [1] http://joeduffyblog.com/2016/02/07/the-error-model/ > > On Thu, Nov 03 2016, Tamas Papp wrote: > > > Unfortunately, the data is too large to fit in memory -- I must process > > it in a stream. > > > > I will look at some libraries, hoping to find an idiomatic solution. I > > am sure that I am not the first one encountering this pattern. > > > > On Thu, Nov 03 2016, Jeffrey Sarnoff wrote: > > > >> or split the string into rows of strings and rows into individual > >> value-keeper strings and put that into a matrix of strings and process > the > >> matrix, tracking row and col and checking for "error" > >> > >> On Thursday, November 3, 2016 at 5:15:06 AM UTC-4, Jeffrey Sarnoff > wrote: > >>> > >>> Or, redefine the question :> > >>> > >>> If you are not tied to string processing, reading the test_file as a > >>> string (if it is) and then splitting the string > >>> ```julia > >>> rowstrings = map(String, split(test_file, '\n')) # need the map to > >>> avoid SubString results, if it matters > >>> # then split the rows on ';' and convert to ?Float64 with NaN for > error > >>> or ?Nullable Ints > >>> # and put the values in a matrix, processing the matrix you have > the > >>> rows and cols > >>> ``` > >>> > >>> > >>> On Thursday, November 3, 2016 at 4:34:53 AM UTC-4, Tamas Papp wrote: > >>>> > >>>> Jeffrey, > >>>> > >>>> Thanks, but my question was about how to have line and column in the > >>>> error message. So I would like to have an error message like this: > >>>> > >>>> ERROR: Failed to parse "error" as type Int64 in column 2, line 3. > >>>> > >>>> My best idea so far: catch the error at each level, and add i and > line > >>>> number. But this requires two try-catch-end blocks with rethrow. > >>>> > >>>> Extremely convoluted mess with rethrow here: > >>>> https://gist.github.com/tpapp/6f67ff36a228f47a1792e011d9b0fc13 > >>>> > >>>> It does what I want, but it is ugly. A simpler solution would be > >>>> appreciated. I am sure I am missing something. > >>>> > >>>> Best, > >>>> > >>>> Tamas > >>>> > >>>> On Thu, Nov 03 2016, Jeffrey Sarnoff wrote: > >>>> > >>>> > Tamas, > >>>> > > >>>> > running this > >>>> > > >>>> > > >>>> > > >>>> > typealias AkoString Union{String, SubString{String}} > >>>> > > >>>> > function parsefield{T <: Real, S <: AkoString}(::Type{T}, str::S) > >>>> > result = T(0) > >>>> > try > >>>> > result = parse(T, str) > >>>> > catch ArgumentError > >>>> > errormsg = string("Failed to parse \"",str,"\" as type ", > T) > >>>> > throw(ErrorException(errormsg)) > >>>> > end > >>>> > return result > >>>> > end > >>>> > > >>>> > function parserow(schema, strings) > >>>> > # keep i for reporting column, currently not used > >>>> > [parsefield(T, string) for (i, (T, string)) in > >>>> enumerate(zip(schema, > >>>> > strings))] > >>>> > end > >>>> > > >>>> > function parsefile(io, schema) > >>>> > line = 1 > >>>> > while !eof(io) > >>>> > strings = split(chomp(readline(io)), ';') > >>>> > parserow(schema, strings) > >>>> > line += 1 # currently not used, use for error reporting > >>>> > end > >>>> > end > >>>> > > >>>> > test_file = """ > >>>> > 1;2;3 > >>>> > 4;5;6 > >>>> > 7;8;error > >>>> > """ > >>>> > > >>>> > parsefile(IOBuffer(test_file), fill(Int, 3)) > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > by evaluating parsefile(...), results in > >>>> > > >>>> > > >>>> > > >>>> > julia> parsefile(IOBuffer(test_file), fill(Int, 3)) > >>>> > ERROR: Failed to parse "error" as type Int64 > >>>> > in parsefield(::Type{Int64}, ::SubString{String}) at ./REPL[2]:7 > >>>> > in (::##1#2)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at > >>>> > ./<missing>:0 > >>>> > in collect_to!(::Array{Int64,1}, > >>>> > > >>>> > ::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2}, > > > >>>> > >>>> > ::Int64, ::Tuple{Int64,Tuple{Int64,Int64}}) at ./array.jl:340 > >>>> > in > >>>> > > >>>> > collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2}) > > > >>>> > >>>> > at ./array.jl:308 > >>>> > in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}}, > >>>> ::Array{DataType,1}) > >>>> > at ./REPL[4]:5 > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > On Wednesday, November 2, 2016 at 1:01:30 PM UTC-4, Tamas Papp > wrote: > >>>> >> > >>>> >> This is a conceptual question. Consider the following (extremely > >>>> >> stylized, but self-contained) code > >>>> >> > >>>> >> parsefield{T <: Real}(::Type{T}, string) = parse(T, string) > >>>> >> > >>>> >> function parserow(schema, strings) > >>>> >> # keep i for reporting column, currently not used > >>>> >> [parsefield(T, string) for (i, (T, string)) in > >>>> enumerate(zip(schema, > >>>> >> strings))] > >>>> >> end > >>>> >> > >>>> >> function parsefile(io, schema) > >>>> >> line = 1 > >>>> >> while !eof(io) > >>>> >> strings = split(chomp(readline(io)), ';') > >>>> >> parserow(schema, strings) > >>>> >> line += 1 # currently not used, use for error reporting > >>>> >> end > >>>> >> end > >>>> >> > >>>> >> test_file = """ > >>>> >> 1;2;3 > >>>> >> 4;5;6 > >>>> >> 7;8;error > >>>> >> """ > >>>> >> > >>>> >> parsefile(IOBuffer(test_file), fill(Int, 3)) > >>>> >> > >>>> >> This will fail with an error message > >>>> >> > >>>> >> ERROR: ArgumentError: invalid base 10 digit 'e' in "error" > >>>> >> in tryparse_internal(::Type{Int64}, ::SubString{String}, ::Int64, > >>>> >> ::Int64, ::Int64 > >>>> >> , ::Bool) at ./parse.jl:88 > >>>> >> in parse(::Type{Int64}, ::SubString{String}) at ./parse.jl:152 > >>>> >> in parsefield(::Type{Int64}, ::SubString{String}) at > ./REPL[152]:1 > >>>> >> in (::##5#6)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at > >>>> >> ./<missing>:0 > >>>> >> in collect_to!(::Array{Int64,1}, > >>>> >> ::Base.Generator{Enumerate{Base.Zip2{Array{DataTy > >>>> >> pe,1},Array{SubString{String},1}}},##5#6}, ::Int64, > >>>> >> ::Tuple{Int64,Tuple{Int64,Int64 > >>>> >> }}) at ./array.jl:340 > >>>> >> in > >>>> >> > >>>> > collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{ > > > >>>> > >>>> >> > >>>> >> String},1}}},##5#6}) at ./array.jl:308 > >>>> >> in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}}, > >>>> >> ::Array{DataType,1}) at ./RE > >>>> >> PL[154]:5 > >>>> >> > >>>> >> Instead, I would like to report something like this: > >>>> >> > >>>> >> ERROR: Failed to parse "error" as Int on line 3, column 3. > >>>> >> > >>>> >> What's the idiomatic way of doing this in Julia? My problem is > that > >>>> >> parsefield fails without knowing line or column (i in parserow). I > >>>> could > >>>> >> catch and rethrow, constructing an error object gradually. Or I > could > >>>> >> pass line and column numbers to parserow and parsefield for error > >>>> >> reporting, but that seems somehow inelegant (I have seen it in > code > >>>> >> though). > >>>> >> > >>>> >> Best, > >>>> >> > >>>> >> Tamas > >>>> >> > >>>> > >>> >