Le vendredi 06 février 2015 à 15:01 -0800, Keith Kee a écrit :
> Hi Milan,
> 
> 
> Thanks for your advice.
> 
> 
> I spotted one corruption in a smaller sample of 3000 lines and then it
> worked.
> 
> 
> Then a tried a larger number of 10000 lines and it gave the following:
> Saw 10000 rows, 4 columns (correct) and 40022 fields*Line 1 has 6
> columns (not sure where "line 1" starts but line 1 was ok as per using
> only 3000 lines file)
> 
> 
> How do I find the corruptions using the above message? Clearly it
> detected 6 columns in some "Line 1", but it is not the first line.
> 
> 
> Are there any julia functions or packages I can use to clean up the
> data or that will highlight corrupted lines in the data.
> 
> 
> I did try loading the 15,000 line csv file into excel and it worked
> fine there.
> 
> 
> Looking forward to your expert advice.
Sorry, I'm not really an expert of that function. Can't you identify the
problematic line by continuing to split the file into halves?

Anyway, you should file a bug against the DataFrames package on GitHub,
people will be more knowledgeable, and there's apparently a bug at least
in the line number that is being reported.


Regards

> Thanks.
> 
> 
> Keith  
> 
> On Friday, 6 February 2015 12:19:55 UTC-8, Milan Bouchet-Valat wrote:
>         Le vendredi 06 février 2015 à 11:12 -0800, Keith Kee a
>         écrit : 
>         > Hi 
>         > 
>         > 
>         > Using DataFrames ( v"0.6.0" ) and Win32 julia 0.3.5 
>         > 
>         > 
>         > ds = readtable("EURUSD.CSV", header=false) 
>         > 
>         > 
>         > 
>         > results in 
>         > 
>         > 
>         > 
>         > BoundsError() 
>         > in findcorruption at io.jl:698 
>         > in readtable! at io.jl:779 
>         > in readtable at io.jl:893 
>         > 
>         > 
>         > The original file has 15000 lines, works when I cut it down
>         to 10 
>         > lines. 
>         > 
>         > 
>         > Please advise as to whether there are limits to readtable on
>         win32 
>         > setups? 
>         15000 sounds quite small even for 32-bit. More likely, the
>         file contains 
>         something readtable() doesn't like, and which does not appear
>         in the 
>         first 10 lines. You could try removing half of the file, see
>         if it 
>         works, and go on like that until you (possibly) find out which
>         line 
>         creates a bug. 
>         
>         
>         Regards 



Reply via email to