would interfacing with SQL via C or C++ be faster to parse and load data in bulk? I have files that are only a few MB worth of text, but can take hours to load due to the amount of parsing I do, and the number of database entries each item in a file makes
On Mon, Nov 28, 2011 at 3:28 AM, Anler Hernandez Peral <[email protected]> wrote: > Hi, this is probably not your case, but in case it is, here is my story: > Creating a script for import CSV files is the best solution as long as they > are few, but in my case, the problem was that I need to import nearly 40 > VERY BIG CSV files, each one mapping a database table, and I needed to do it > quickly. I thought that the best way was to use MySQL's "load data in > local..." functionality since it works very fast and I could create only one > function to import all the files. The problem was that my CSV files were > pretty big and my database server were eating big amounts of memory and > crashing my site so I ended up slicing each file in smaller chunks. > Again, this is a very specific need, but in case you find yourself in such > situation, here's my base code from which you can extend ;) > > https://gist.github.com/1dc28cd496d52ad67b29 > -- > anler > > > On Sun, Nov 27, 2011 at 7:56 PM, Andre Terra <[email protected]> wrote: >> >> This should be run asynchronously (i.e. celery) when importing large >> files. >> If you have a lot of categories/subcategories, you will need to bulk >> insert them instead of looping through the data and just using >> get_or_create. A single, long transaction will definitely bring great >> improvements to speed. >> One tool is DSE, which I've mentioned before. >> Good luck! >> >> Cheers, >> AT >> >> On Sat, Nov 26, 2011 at 8:44 PM, Petr Přikryl <[email protected]> wrote: >>> >>> >>> import csv >>> >>> data = csv.reader(open('/path/to/csv', 'r'), delimiter=';') >>> >>> for row in data: >>> >>> category = Category.objects.get_or_create(name=row[0]) >>> >>> sub_category = SubCategory.objects.get_or_create(name=row[1], >>> >>> defaults={'parent_category': category}) >>> >>> product = Product.objects.get_or_create(name=row[2], >>> >>> defaults={'sub_category': sub_category}) >>> >>> There are few potential problems with the cvs as used here. >>> >>> Firstly, the file should be opened in binary mode. In Unix-based >>> systems, the binary mode is technically similar to text mode. >>> However, you may once observe problems when you move >>> the code to another environment (Windows). >>> >>> Secondly, the opened file should always be closed -- especially >>> when building application (web) that may run for a long time. >>> You can do it like this: >>> >>> ... >>> f = open('/path/to/csv', 'rb') >>> data = csv.reader(f, delimiter=';') >>> for ... >>> ... >>> f.close() >>> >>> Or you can use the new Python construct "with". >>> >>> P. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Django users" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/django-users?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django users" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/django-users?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > -- Nathan McCorkle Rochester Institute of Technology College of Science, Biotechnology/Bioinformatics -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

