Le samedi 07 février 2015 à 11:23 -0800, Keith Kee a écrit :
> Hi Milan,
>
>
> I used the too csvfix 1.6 validate function by Neil Butterworth for
> the csv file (only for csv, not for wsv) and discovered two corrupted
> lines and then it worked.
>
>
> The link to the manual:
> http://neilb.bitbucket.org/csvfix/manual/csvfix16/csvfix.html
>
>
> As a feature request, it would be very useful if either readtable
> findcorruption function returns the corrupted line number and/or
> corrupted line for the csv file.
> If performance is the design priority, provide another dedicated
> findcorruption function to help with cleaning up the csv files should
> readtable fails
Good to hear you found the problem. Indeed, please file an issue so that
the error message can be improved. If you could create a small example
to reproduce the issue, it would be very useful.
Regards
> Keith
>
>
>
>
>
> On Saturday, 7 February 2015 08:17:51 UTC-8, Milan Bouchet-Valat
> wrote:
> Le vendredi 06 février 2015 à 15:01 -0800, Keith Kee a
> écrit :
> > Hi Milan,
> >
> >
> > Thanks for your advice.
> >
> >
> > I spotted one corruption in a smaller sample of 3000 lines
> and then it
> > worked.
> >
> >
> > Then a tried a larger number of 10000 lines and it gave the
> following:
> > Saw 10000 rows, 4 columns (correct) and 40022 fields*Line 1
> has 6
> > columns (not sure where "line 1" starts but line 1 was ok as
> per using
> > only 3000 lines file)
> >
> >
> > How do I find the corruptions using the above message?
> Clearly it
> > detected 6 columns in some "Line 1", but it is not the first
> line.
> >
> >
> > Are there any julia functions or packages I can use to clean
> up the
> > data or that will highlight corrupted lines in the data.
> >
> >
> > I did try loading the 15,000 line csv file into excel and it
> worked
> > fine there.
> >
> >
> > Looking forward to your expert advice.
> Sorry, I'm not really an expert of that function. Can't you
> identify the
> problematic line by continuing to split the file into halves?
>
> Anyway, you should file a bug against the DataFrames package
> on GitHub,
> people will be more knowledgeable, and there's apparently a
> bug at least
> in the line number that is being reported.
>
>
> Regards
>
> > Thanks.
> >
> >
> > Keith
> >
> > On Friday, 6 February 2015 12:19:55 UTC-8, Milan
> Bouchet-Valat wrote:
> > Le vendredi 06 février 2015 à 11:12 -0800, Keith Kee
> a
> > écrit :
> > > Hi
> > >
> > >
> > > Using DataFrames ( v"0.6.0" ) and Win32 julia
> 0.3.5
> > >
> > >
> > > ds = readtable("EURUSD.CSV", header=false)
> > >
> > >
> > >
> > > results in
> > >
> > >
> > >
> > > BoundsError()
> > > in findcorruption at io.jl:698
> > > in readtable! at io.jl:779
> > > in readtable at io.jl:893
> > >
> > >
> > > The original file has 15000 lines, works when I
> cut it down
> > to 10
> > > lines.
> > >
> > >
> > > Please advise as to whether there are limits to
> readtable on
> > win32
> > > setups?
> > 15000 sounds quite small even for 32-bit. More
> likely, the
> > file contains
> > something readtable() doesn't like, and which does
> not appear
> > in the
> > first 10 lines. You could try removing half of the
> file, see
> > if it
> > works, and go on like that until you (possibly) find
> out which
> > line
> > creates a bug.
> >
> >
> > Regards
>
>
>