On 2014-11-26 at 13:20:47 +0100, kmx wrote:
I have also released https://metacpan.org/pod/PDL::IO::CSV (using the same
approach, utilizing Text::CSV_XS)
Very nice! I'm going to see how I can use this together with my data
frame implementation. Since the data frames are just a wrapper around
kmx++
On Wed, Nov 26, 2014 at 7:20 AM, kmx k...@atlas.cz wrote:
I have also released https://metacpan.org/pod/PDL::IO::CSV (using the
same approach, utilizing Text::CSV_XS)
--
kmx
On 24.11.2014 23:29, kmx wrote:
I have release my solution as https://metacpan.org/pod/PDL::IO::DBI
--
I have tried pg_getcopydata, however I was not able to make it better than
my old approach. After many tries it was still 15-20% slower.
My guess is that pg_getcopydata(..) might be significantly faster when
dumping the whole table (which I was not able to test as the table in
question was
Hi kmx: What if you make a temporary table by selecting the subset of
the table you want and then use pg_getcopydata to dump this entire temp
table?
Just a thought...
Regards,
Doug Hunt
dh...@ucar.edu
Software Engineer
UCAR - COSMIC, Tel. (303) 497-2611
On Fri, 14 Nov 2014, kmx wrote:
Hi,
I want to ask what others use when need to load data from database into a
piddle.
Of course I know about simple approach like this:
use PDL;
use DBI;
my $dbh = DBI-connect($dsn);
my $pdl = pdl($dbh-selectall_arrayref($sql_query));
But it does not scale well for very large data
Hi,
if you can, I'd suggest storing the pdl as a binary data into the
database, for best performance.
DBI converts everything else into perl strings, which you probably want
to avoid. How well does your approach scale?
I've been thinking about that problem - but no more - for some time, and
On 11/12/2014 07:43 AM, kmx wrote:
my $dbh = DBI-connect($dsn);
my $pdl = pdl($dbh-selectall_arrayref($sql_query));
But it does not scale well for very large data (millions of rows).
Hi KMX
If you're using Postgresql you should use the DBD::Pg-pg_getcopydata
using the COPY mytable to
Hi,
as I am not producing DB data via PDL (in fact I am not producing that data
at all) it is not possible to store them as PDL binary data.
As for the performance:
- loading 3.4 million rows, 7 columns each, pdl type: double
- 41s** SQLite (from SSD disk)
- 34s Postgres 9.2 (at localhost,