Re: [Pytables-users] Pytables bulk loading data

2013-07-18 Thread Pushkar Raj Pande
Thanks. I will try it out and post any findings.

Pushkar

On Thu, Jul 18, 2013 at 12:36 AM, Andreas Hilboll li...@hilboll.de wrote:

 

 You could use pandas_ and the read_table function. There, you have nrows
 and skiprows parameters with which you can easily do your own 'streaming'.

 .. _pandas: http://pandas.pydata.org/



On Thu, Jul 18, 2013 at 1:00 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 Hi Pushkar,

 Il 18/07/2013 08:45, Pushkar Raj Pande ha scritto:
  Both loadtxt and genfromtxt read the entire data into memory which is not
  desirable. Is there a way to achieve streaming writes?
 

 OK, probably fromfile [1] can help you to cook something that works
 without loading the entire file into memory (and without too much
 iterations over the file).

 Anyway I strongly recommend you to not perform read/write cycles on
 single lines, rather define a reasonable data block size (number of
 rows) and process the file in chunks.

 If you find a reasonably simple solution it would be nice to include it
 in out documentation as an example or a recipe [2]

 [1]

 http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#numpy.fromfile
 [2] http://pytables.github.io/latest/cookbook/index.html

 best regards

 antonio


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Pytables bulk loading data

2013-07-17 Thread Antonio Valentino
Hi Pushkar,

Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
 Hi all,
 
 I am trying to figure out the best way to bulk load data into pytables.
 This question may have been already answered but I couldn't find what I was
 looking for.
 
 The source data is in form of csv which may require parsing, type checking
 and setting default values if it doesn't conform to the type of the column.
 There are over 100 columns in a record. Doing this in a loop in python for
 each row of the record is very slow compared to just fetching the rows from
 one pytable file and writing it to another. Difference is almost a factor
 of ~50.
 
 I believe if I load the data using a C procedure that does the parsing and
 builds the records to write in pytables I can get close to the speed of
 just copying and writing the rows from 1 pytable to another. But may be
 there is something simple and better that already exists. Can someone
 please advise? But if it is a C procedure that I should write can someone
 point me to some examples or snippets that I can refer to put this together.
 
 Thanks,
 Pushkar
 

numpy has some tools for loading data from csv files like loadtxt [1],
genfromtxt [2] and other variants.

Non of them is OK for you?

[1]
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
[2]
http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt


cheers

-- 
Antonio Valentino

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users