Re: [julia-users] BoundsError() in findcorruption @ io.jl and readtable @ io.jl

Milan Bouchet-Valat Sat, 07 Feb 2015 11:28:55 -0800

Le samedi 07 février 2015 à 11:23 -0800, Keith Kee a écrit :
> Hi Milan,
> 
> 
> I used the too csvfix 1.6 validate function by Neil Butterworth for
> the csv file (only for csv, not for wsv) and discovered two corrupted
> lines and then it worked.
> 
> 
> The link to the manual:
> http://neilb.bitbucket.org/csvfix/manual/csvfix16/csvfix.html
> 
> 
> As a feature request, it would be very useful if either readtable
> findcorruption function returns the corrupted line number and/or
> corrupted line for the csv file.
> If performance is the design priority, provide another dedicated
> findcorruption function to help with cleaning up the csv files should
> readtable fails
Good to hear you found the problem. Indeed, please file an issue so that
the error message can be improved. If you could create a small example
to reproduce the issue, it would be very useful.



Regards


> Keith 
> 
> 
> 
> 
> 
> On Saturday, 7 February 2015 08:17:51 UTC-8, Milan Bouchet-Valat
> wrote:
>         Le vendredi 06 février 2015 à 15:01 -0800, Keith Kee a
>         écrit : 
>         > Hi Milan, 
>         > 
>         > 
>         > Thanks for your advice. 
>         > 
>         > 
>         > I spotted one corruption in a smaller sample of 3000 lines
>         and then it 
>         > worked. 
>         > 
>         > 
>         > Then a tried a larger number of 10000 lines and it gave the
>         following: 
>         > Saw 10000 rows, 4 columns (correct) and 40022 fields*Line 1
>         has 6 
>         > columns (not sure where "line 1" starts but line 1 was ok as
>         per using 
>         > only 3000 lines file) 
>         > 
>         > 
>         > How do I find the corruptions using the above message?
>         Clearly it 
>         > detected 6 columns in some "Line 1", but it is not the first
>         line. 
>         > 
>         > 
>         > Are there any julia functions or packages I can use to clean
>         up the 
>         > data or that will highlight corrupted lines in the data. 
>         > 
>         > 
>         > I did try loading the 15,000 line csv file into excel and it
>         worked 
>         > fine there. 
>         > 
>         > 
>         > Looking forward to your expert advice. 
>         Sorry, I'm not really an expert of that function. Can't you
>         identify the 
>         problematic line by continuing to split the file into halves? 
>         
>         Anyway, you should file a bug against the DataFrames package
>         on GitHub, 
>         people will be more knowledgeable, and there's apparently a
>         bug at least 
>         in the line number that is being reported. 
>         
>         
>         Regards 
>         
>         > Thanks. 
>         > 
>         > 
>         > Keith   
>         > 
>         > On Friday, 6 February 2015 12:19:55 UTC-8, Milan
>         Bouchet-Valat wrote: 
>         >         Le vendredi 06 février 2015 à 11:12 -0800, Keith Kee
>         a 
>         >         écrit : 
>         >         > Hi 
>         >         > 
>         >         > 
>         >         > Using DataFrames ( v"0.6.0" ) and Win32 julia
>         0.3.5 
>         >         > 
>         >         > 
>         >         > ds = readtable("EURUSD.CSV", header=false) 
>         >         > 
>         >         > 
>         >         > 
>         >         > results in 
>         >         > 
>         >         > 
>         >         > 
>         >         > BoundsError() 
>         >         > in findcorruption at io.jl:698 
>         >         > in readtable! at io.jl:779 
>         >         > in readtable at io.jl:893 
>         >         > 
>         >         > 
>         >         > The original file has 15000 lines, works when I
>         cut it down 
>         >         to 10 
>         >         > lines. 
>         >         > 
>         >         > 
>         >         > Please advise as to whether there are limits to
>         readtable on 
>         >         win32 
>         >         > setups? 
>         >         15000 sounds quite small even for 32-bit. More
>         likely, the 
>         >         file contains 
>         >         something readtable() doesn't like, and which does
>         not appear 
>         >         in the 
>         >         first 10 lines. You could try removing half of the
>         file, see 
>         >         if it 
>         >         works, and go on like that until you (possibly) find
>         out which 
>         >         line 
>         >         creates a bug. 
>         >         
>         >         
>         >         Regards 
>         
>         
>

Re: [julia-users] BoundsError() in findcorruption @ io.jl and readtable @ io.jl

Reply via email to