This is a second iteration of a previous thread that didn't resolve few weeks ago. I made some more modifications to the code to make it compatible with the current COPY FROM code and it should be more agreeable this time.
The main premise of the new code is that it improves the text data parsing speed by about 4-5x, resulting in total improvements that lie between 15% to 95% for data importing (higher range gains will occur on large data rows without many columns - implying more parsing and less converting to internal format). This is done by replacing a char-at-a-time parsing with buffered parsing and also using fast scan routines and minimum amount of loading/appending into line and attribute buf. The new code passes both COPY regression tests (copy, copy2) and doesn't break any of the others. It also supports encoding conversions (thanks Peter and Tatsuo and your feedback) and the 3 line-end types. Having said that, using COPY with different encodings was only minimally tested. We are looking into creating new tests and hopefully add them to postgres regression suite one day if it's desired by the community. This new code is improving the delimited data format parsing. BINARY and CSV will stay the same and will be executed separately for now (therefore there is some code duplication) In the future I plan to write improvements to the CSV path too, so that it will be executed without duplication of code. I am still missing supporting data that uses COPY_OLD_FE (question: what are the use cases? When will it be used? Please advise) I'll send out the patch soon. It's basically there to show that there is a way to load data faster. In future releases of the patch it will be more complete and elegant. I'll appreciate any comments/advices. Alon. ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly