Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8
Both loadtxt and genfromtxt read the entire data into memory which is not desirable. Is there a way to achieve streaming writes? Thanks, Pushkar On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.comwrote: Thanks Antonio and Anthony. I will give this a try. -Pushkar On Wed, Jul 17, 2013 at 2:59 PM, pytables-users-requ...@lists.sourceforge.net wrote: Date: Wed, 17 Jul 2013 16:59:16 -0500 From: Anthony Scopatz scop...@gmail.com Subject: Re: [Pytables-users] Pytables bulk loading data To: Discussion list for PyTables pytables-users@lists.sourceforge.net Message-ID: capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Hi Pushkar, I agree with Antonio. You should load your data with NumPy functions and then write back out to PyTables. This is the fastest way to do things. Be Well Anthony On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino antonio.valent...@tiscali.it wrote: Hi Pushkar, Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: Hi all, I am trying to figure out the best way to bulk load data into pytables. This question may have been already answered but I couldn't find what I was looking for. The source data is in form of csv which may require parsing, type checking and setting default values if it doesn't conform to the type of the column. There are over 100 columns in a record. Doing this in a loop in python for each row of the record is very slow compared to just fetching the rows from one pytable file and writing it to another. Difference is almost a factor of ~50. I believe if I load the data using a C procedure that does the parsing and builds the records to write in pytables I can get close to the speed of just copying and writing the rows from 1 pytable to another. But may be there is something simple and better that already exists. Can someone please advise? But if it is a C procedure that I should write can someone point me to some examples or snippets that I can refer to put this together. Thanks, Pushkar numpy has some tools for loading data from csv files like loadtxt [1], genfromtxt [2] and other variants. Non of them is OK for you? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt cheers -- Antonio Valentino -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- next part -- An HTML attachment was scrubbed... -- -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk -- ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users End of Pytables-users Digest, Vol 86, Issue 8 * -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8
On 18.07.2013 08:45, Pushkar Raj Pande wrote: Both loadtxt and genfromtxt read the entire data into memory which is not desirable. Is there a way to achieve streaming writes? Thanks, Pushkar On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.com mailto:topgun...@gmail.com wrote: Thanks Antonio and Anthony. I will give this a try. -Pushkar On Wed, Jul 17, 2013 at 2:59 PM, pytables-users-requ...@lists.sourceforge.net mailto:pytables-users-requ...@lists.sourceforge.net wrote: Date: Wed, 17 Jul 2013 16:59:16 -0500 From: Anthony Scopatz scop...@gmail.com mailto:scop...@gmail.com Subject: Re: [Pytables-users] Pytables bulk loading data To: Discussion list for PyTables pytables-users@lists.sourceforge.net mailto:pytables-users@lists.sourceforge.net Message-ID: capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com mailto:capk-6t4ht9%2bncdd_1oojrbn4u_6%2bouekobklmokeufjojjk...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Hi Pushkar, I agree with Antonio. You should load your data with NumPy functions and then write back out to PyTables. This is the fastest way to do things. Be Well Anthony On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino antonio.valent...@tiscali.it mailto:antonio.valent...@tiscali.it wrote: Hi Pushkar, Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: Hi all, I am trying to figure out the best way to bulk load data into pytables. This question may have been already answered but I couldn't find what I was looking for. The source data is in form of csv which may require parsing, type checking and setting default values if it doesn't conform to the type of the column. There are over 100 columns in a record. Doing this in a loop in python for each row of the record is very slow compared to just fetching the rows from one pytable file and writing it to another. Difference is almost a factor of ~50. I believe if I load the data using a C procedure that does the parsing and builds the records to write in pytables I can get close to the speed of just copying and writing the rows from 1 pytable to another. But may be there is something simple and better that already exists. Can someone please advise? But if it is a C procedure that I should write can someone point me to some examples or snippets that I can refer to put this together. Thanks, Pushkar numpy has some tools for loading data from csv files like loadtxt [1], genfromtxt [2] and other variants. Non of them is OK for you? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt cheers -- Antonio Valentino -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net mailto:Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- next part -- An HTML attachment was scrubbed... -- -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today!
Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8
Hi Pushkar, Il 18/07/2013 08:45, Pushkar Raj Pande ha scritto: Both loadtxt and genfromtxt read the entire data into memory which is not desirable. Is there a way to achieve streaming writes? OK, probably fromfile [1] can help you to cook something that works without loading the entire file into memory (and without too much iterations over the file). Anyway I strongly recommend you to not perform read/write cycles on single lines, rather define a reasonable data block size (number of rows) and process the file in chunks. If you find a reasonably simple solution it would be nice to include it in out documentation as an example or a recipe [2] [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#numpy.fromfile [2] http://pytables.github.io/latest/cookbook/index.html best regards antonio Thanks, Pushkar On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.comwrote: Thanks Antonio and Anthony. I will give this a try. -Pushkar On Wed, Jul 17, 2013 at 2:59 PM, pytables-users-requ...@lists.sourceforge.net wrote: Date: Wed, 17 Jul 2013 16:59:16 -0500 From: Anthony Scopatz scop...@gmail.com Subject: Re: [Pytables-users] Pytables bulk loading data To: Discussion list for PyTables pytables-users@lists.sourceforge.net Message-ID: capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Hi Pushkar, I agree with Antonio. You should load your data with NumPy functions and then write back out to PyTables. This is the fastest way to do things. Be Well Anthony On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino antonio.valent...@tiscali.it wrote: Hi Pushkar, Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: Hi all, I am trying to figure out the best way to bulk load data into pytables. This question may have been already answered but I couldn't find what I was looking for. The source data is in form of csv which may require parsing, type checking and setting default values if it doesn't conform to the type of the column. There are over 100 columns in a record. Doing this in a loop in python for each row of the record is very slow compared to just fetching the rows from one pytable file and writing it to another. Difference is almost a factor of ~50. I believe if I load the data using a C procedure that does the parsing and builds the records to write in pytables I can get close to the speed of just copying and writing the rows from 1 pytable to another. But may be there is something simple and better that already exists. Can someone please advise? But if it is a C procedure that I should write can someone point me to some examples or snippets that I can refer to put this together. Thanks, Pushkar numpy has some tools for loading data from csv files like loadtxt [1], genfromtxt [2] and other variants. Non of them is OK for you? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt cheers -- Antonio Valentino -- Antonio Valentino -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users