On Fri, Apr 5, 2013 at 7:45 PM, Rajeev Prasad <rp.ne...@yahoo.com> wrote:

> hello,
>
> I have a table with around 2,000,000 records (15 columns). I have to sync
> this from an outside source once everyday. not all records are
> changed/removed /new-added everyday. so what is the best way to update only
> those which have changed/added/or deleted?
>
> i can use update_or_create but that will update (re-write the row) even if
> nothing has changed in the row/record. wont that be an overhead? how can i
> escape that? what would be the fastest and least resources consuming way to
> do this table update?
>
> I also have another table with 500,000 rows and i wish to implement the
> same solution to that too.
>

Geesh, when your data loads are hitting 6 and 7 digits rows figures, you
should seriously consider the tools that come with the RDBMS, like
replication.  Is your outside source a modern RDBMS that features
master/slave replication?

Otherwise, without some sort of field indicator on the source side saying
that this was a modified row, you can't tell if it's been changed or not.
 That means DBIC, or whatever other ETL middleware you choose, will have to
compare every single row to see if it exists or not.  With that many rows,
it's better off to use the "UPSERT" SQL statements for mass updating.  In
MySQL, it's commands like "REPLACE" and "INSERT ... ON DUPLICATE KEY UPDATE
...".  In Oracle, it's MERGE.

You also mentioned "removed".  So you want an exact copy of the source
table?  Yeah, screw all of the tip-toeing around the existing data and just
TRUNCATE the table, followed by the best BULK INSERT command your RDBMS
recommends.  (For example, MySQL recommends LOAD DATA INFILE.)  And make
sure you don't do something crazy, like reading the entire source table
into memory via Perl variables.

Keep in mind that almost none of this recommends using DBIC.  DBIC is for
making changes in smaller quantities than what you're asking.  YMMV, so
consult your DBA folks first before you start dumping massive quantities of
data into their servers.

-- 
Brendan Byrd <p...@resonatorsoft.org>
Brendan Byrd <bb...@cpan.org>
_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk

Reply via email to