Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8

2013-07-18 Thread Pushkar Raj Pande
Both loadtxt and genfromtxt read the entire data into memory which is not
desirable. Is there a way to achieve streaming writes?

Thanks,
Pushkar


On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.comwrote:

 Thanks Antonio and Anthony. I will give this a try.

 -Pushkar


 On Wed, Jul 17, 2013 at 2:59 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Date: Wed, 17 Jul 2013 16:59:16 -0500
 From: Anthony Scopatz scop...@gmail.com
 Subject: Re: [Pytables-users] Pytables bulk loading data
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 
 capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Hi Pushkar,

 I agree with Antonio.  You should load your data with NumPy functions and
 then write back out to PyTables.  This is the fastest way to do things.

 Be Well
 Anthony


 On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino 
 antonio.valent...@tiscali.it wrote:

  Hi Pushkar,
 
  Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
   Hi all,
  
   I am trying to figure out the best way to bulk load data into
 pytables.
   This question may have been already answered but I couldn't find what
 I
  was
   looking for.
  
   The source data is in form of csv which may require parsing, type
  checking
   and setting default values if it doesn't conform to the type of the
  column.
   There are over 100 columns in a record. Doing this in a loop in python
  for
   each row of the record is very slow compared to just fetching the rows
  from
   one pytable file and writing it to another. Difference is almost a
 factor
   of ~50.
  
   I believe if I load the data using a C procedure that does the parsing
  and
   builds the records to write in pytables I can get close to the speed
 of
   just copying and writing the rows from 1 pytable to another. But may
 be
   there is something simple and better that already exists. Can someone
   please advise? But if it is a C procedure that I should write can
 someone
   point me to some examples or snippets that I can refer to put this
  together.
  
   Thanks,
   Pushkar
  
 
  numpy has some tools for loading data from csv files like loadtxt [1],
  genfromtxt [2] and other variants.
 
  Non of them is OK for you?
 
  [1]
 
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
  [2]
 
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
 
 
  cheers
 
  --
  Antonio Valentino
 
 
 
 --
  See everything from the browser to the database with AppDynamics
  Get end-to-end visibility with application monitoring from AppDynamics
  Isolate bottlenecks and diagnose root cause in seconds.
  Start your free trial of AppDynamics Pro today!
 
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 -- next part --
 An HTML attachment was scrubbed...

 --


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!

 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk

 --

 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


 End of Pytables-users Digest, Vol 86, Issue 8
 *



--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8

2013-07-18 Thread Andreas Hilboll
On 18.07.2013 08:45, Pushkar Raj Pande wrote:
 Both loadtxt and genfromtxt read the entire data into memory which is
 not desirable. Is there a way to achieve streaming writes?
 
 Thanks,
 Pushkar
 
 
 On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.com
 mailto:topgun...@gmail.com wrote:
 
 Thanks Antonio and Anthony. I will give this a try.
 
 -Pushkar
 
 
 On Wed, Jul 17, 2013 at 2:59 PM,
 pytables-users-requ...@lists.sourceforge.net
 mailto:pytables-users-requ...@lists.sourceforge.net wrote:
 
 Date: Wed, 17 Jul 2013 16:59:16 -0500
 From: Anthony Scopatz scop...@gmail.com mailto:scop...@gmail.com
 Subject: Re: [Pytables-users] Pytables bulk loading data
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 mailto:pytables-users@lists.sourceforge.net
 Message-ID:

 capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com
 
 mailto:capk-6t4ht9%2bncdd_1oojrbn4u_6%2bouekobklmokeufjojjk...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1
 
 Hi Pushkar,
 
 I agree with Antonio.  You should load your data with NumPy
 functions and
 then write back out to PyTables.  This is the fastest way to do
 things.
 
 Be Well
 Anthony
 
 
 On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino 
 antonio.valent...@tiscali.it
 mailto:antonio.valent...@tiscali.it wrote:
 
  Hi Pushkar,
 
  Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
   Hi all,
  
   I am trying to figure out the best way to bulk load data
 into pytables.
   This question may have been already answered but I couldn't
 find what I
  was
   looking for.
  
   The source data is in form of csv which may require parsing,
 type
  checking
   and setting default values if it doesn't conform to the type
 of the
  column.
   There are over 100 columns in a record. Doing this in a loop
 in python
  for
   each row of the record is very slow compared to just
 fetching the rows
  from
   one pytable file and writing it to another. Difference is
 almost a factor
   of ~50.
  
   I believe if I load the data using a C procedure that does
 the parsing
  and
   builds the records to write in pytables I can get close to
 the speed of
   just copying and writing the rows from 1 pytable to another.
 But may be
   there is something simple and better that already exists.
 Can someone
   please advise? But if it is a C procedure that I should
 write can someone
   point me to some examples or snippets that I can refer to
 put this
  together.
  
   Thanks,
   Pushkar
  
 
  numpy has some tools for loading data from csv files like
 loadtxt [1],
  genfromtxt [2] and other variants.
 
  Non of them is OK for you?
 
  [1]
 
 
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
  [2]
 
 
 
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
 
 
  cheers
 
  --
  Antonio Valentino
 
 
 
 
 --
  See everything from the browser to the database with AppDynamics
  Get end-to-end visibility with application monitoring from
 AppDynamics
  Isolate bottlenecks and diagnose root cause in seconds.
  Start your free trial of AppDynamics Pro today!
 
 
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 -- next part --
 An HTML attachment was scrubbed...
 
 --
 
 
 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from
 AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 
 

Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8

2013-07-18 Thread Antonio Valentino
Hi Pushkar,

Il 18/07/2013 08:45, Pushkar Raj Pande ha scritto:
 Both loadtxt and genfromtxt read the entire data into memory which is not
 desirable. Is there a way to achieve streaming writes?
 

OK, probably fromfile [1] can help you to cook something that works
without loading the entire file into memory (and without too much
iterations over the file).

Anyway I strongly recommend you to not perform read/write cycles on
single lines, rather define a reasonable data block size (number of
rows) and process the file in chunks.

If you find a reasonably simple solution it would be nice to include it
in out documentation as an example or a recipe [2]

[1]
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#numpy.fromfile
[2] http://pytables.github.io/latest/cookbook/index.html

best regards

antonio


 Thanks,
 Pushkar
 
 
 On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande topgun...@gmail.comwrote:
 
 Thanks Antonio and Anthony. I will give this a try.

 -Pushkar


 On Wed, Jul 17, 2013 at 2:59 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Date: Wed, 17 Jul 2013 16:59:16 -0500
 From: Anthony Scopatz scop...@gmail.com
 Subject: Re: [Pytables-users] Pytables bulk loading data
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 
 capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Hi Pushkar,

 I agree with Antonio.  You should load your data with NumPy functions and
 then write back out to PyTables.  This is the fastest way to do things.

 Be Well
 Anthony


 On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino 
 antonio.valent...@tiscali.it wrote:

 Hi Pushkar,

 Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
 Hi all,

 I am trying to figure out the best way to bulk load data into
 pytables.
 This question may have been already answered but I couldn't find what
 I
 was
 looking for.

 The source data is in form of csv which may require parsing, type
 checking
 and setting default values if it doesn't conform to the type of the
 column.
 There are over 100 columns in a record. Doing this in a loop in python
 for
 each row of the record is very slow compared to just fetching the rows
 from
 one pytable file and writing it to another. Difference is almost a
 factor
 of ~50.

 I believe if I load the data using a C procedure that does the parsing
 and
 builds the records to write in pytables I can get close to the speed
 of
 just copying and writing the rows from 1 pytable to another. But may
 be
 there is something simple and better that already exists. Can someone
 please advise? But if it is a C procedure that I should write can
 someone
 point me to some examples or snippets that I can refer to put this
 together.

 Thanks,
 Pushkar


 numpy has some tools for loading data from csv files like loadtxt [1],
 genfromtxt [2] and other variants.

 Non of them is OK for you?

 [1]


 http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
 [2]


 http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt


 cheers

 --
 Antonio Valentino


-- 
Antonio Valentino

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users