Speed is the problem. I am looking for the fastest possible way to do this. I
was thinking of using Pandas and was able to achieve fair performance using
that lib. It just seemed like I was using panada as a middle man it
introduces some issues with the data types. Could it be faster to pull it into
a numpy array in chunks and write it out?
From: Anthony Scopatz [mailto:scop...@gmail.com]
Sent: Friday, May 03, 2013 2:14 PM
To: Discussion list for PyTables
Subject: Re: [Pytables-users] Row.append()
On Fri, May 3, 2013 at 1:15 PM, Jim Knoll
<jim.kn...@spottradingllc.com<mailto:jim.kn...@spottradingllc.com>> wrote:
I am trying to make this better / faster...
Data comes faster than I can store it on one box. So My though was to have
many boxes each storing their own part in their own table.
Later I would concatenate the tables together with something like this:
dest_h5f = pt.openFile(path + 'big_mater.h5','a')
for source_path in source_h5_path_list:
h5f = pt.openFile(source_path,'r')
for node in h5f.root:
dest_table = dest_h5f.getNode('/', name = node.name<http://node.name>)
print node.nrows
if node.nrows > 0 and node.nrows < 1000000: # found I needed to limit
the max size or I would crash
dest_table.append(node.read())
dest_table.flush()
h5f.close()
dest_h5f.close()
I could add the logic to iter in chunks over the source data to overcome the
crash and but I suspect there could be a better way.
Hi Jim,
You can just iterate over each row in the table (ie "for row in node"). This
is slow, but would solve the problem.
Take a table in one h5 file and append it to a table in another h5 file.
Looked like Table.copy() would do the trick but don't see how I get it to
append to an existing table.
You could append directly by using the where_append() method with the condition
"'True'" to append the whole table. This will automatically do the chunking
for you.
Be Well
Anthony
My h5 files have 4 rec arrays all stored in root.
Any suggestions?
________________________________
Jim Knoll
DBA/Developer II
Spot Trading L.L.C
440 South LaSalle St., Suite 2800
Chicago, IL 60605
Office: 312.362.4550<tel:312.362.4550>
Direct: 312-362-4798<tel:312-362-4798>
Fax: 312.362.4551<tel:312.362.4551>
jim.kn...@spottradingllc.com<mailto:jim.kn...@spottradingllc.com>
www.spottradingllc.com<http://www.spottradingllc.com/>
________________________________
The information contained in this message may be privileged and confidential
and protected from disclosure. If the reader of this message is not the
intended recipient, or an employee or agent responsible for delivering this
message to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please notify us
immediately by replying to the message and deleting it from your computer.
Thank you. Spot Trading, LLC
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net<mailto:Pytables-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/pytables-users
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users