On 18.07.2013 08:45, Pushkar Raj Pande wrote: > Both loadtxt and genfromtxt read the entire data into memory which is > not desirable. Is there a way to achieve streaming writes? > > Thanks, > Pushkar > > > On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande <topgun...@gmail.com > <mailto:topgun...@gmail.com>> wrote: > > Thanks Antonio and Anthony. I will give this a try. > > -Pushkar > > > On Wed, Jul 17, 2013 at 2:59 PM, > <pytables-users-requ...@lists.sourceforge.net > <mailto:pytables-users-requ...@lists.sourceforge.net>> wrote: > > Date: Wed, 17 Jul 2013 16:59:16 -0500 > From: Anthony Scopatz <scop...@gmail.com <mailto:scop...@gmail.com>> > Subject: Re: [Pytables-users] Pytables bulk loading data > To: Discussion list for PyTables > <pytables-users@lists.sourceforge.net > <mailto:pytables-users@lists.sourceforge.net>> > Message-ID: > > <capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com > > <mailto:capk-6t4ht9%2bncdd_1oojrbn4u_6%2bouekobklmokeufjojjk...@mail.gmail.com>> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Pushkar, > > I agree with Antonio. You should load your data with NumPy > functions and > then write back out to PyTables. This is the fastest way to do > things. > > Be Well > Anthony > > > On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino < > antonio.valent...@tiscali.it > <mailto:antonio.valent...@tiscali.it>> wrote: > > > Hi Pushkar, > > > > Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: > > > Hi all, > > > > > > I am trying to figure out the best way to bulk load data > into pytables. > > > This question may have been already answered but I couldn't > find what I > > was > > > looking for. > > > > > > The source data is in form of csv which may require parsing, > type > > checking > > > and setting default values if it doesn't conform to the type > of the > > column. > > > There are over 100 columns in a record. Doing this in a loop > in python > > for > > > each row of the record is very slow compared to just > fetching the rows > > from > > > one pytable file and writing it to another. Difference is > almost a factor > > > of ~50. > > > > > > I believe if I load the data using a C procedure that does > the parsing > > and > > > builds the records to write in pytables I can get close to > the speed of > > > just copying and writing the rows from 1 pytable to another. > But may be > > > there is something simple and better that already exists. > Can someone > > > please advise? But if it is a C procedure that I should > write can someone > > > point me to some examples or snippets that I can refer to > put this > > together. > > > > > > Thanks, > > > Pushkar > > > > > > > numpy has some tools for loading data from csv files like > loadtxt [1], > > genfromtxt [2] and other variants. > > > > Non of them is OK for you? > > > > [1] > > > > > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt > > [2] > > > > > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt > > > > > > cheers > > > > -- > > Antonio Valentino > > > > > > > > ------------------------------------------------------------------------------ > > See everything from the browser to the database with AppDynamics > > Get end-to-end visibility with application monitoring from > AppDynamics > > Isolate bottlenecks and diagnose root cause in seconds. > > Start your free trial of AppDynamics Pro today! > > > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > _______________________________________________ > > Pytables-users mailing list > > Pytables-users@lists.sourceforge.net > <mailto:Pytables-users@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from > AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > ------------------------------ > > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > <mailto:Pytables-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > End of Pytables-users Digest, Vol 86, Issue 8 > ********************************************* > > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users >
You could use pandas_ and the read_table function. There, you have nrows and skiprows parameters with which you can easily do your own 'streaming'. .. _pandas: http://pandas.pydata.org/ -- Andreas ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users