[h2] CSVREAD - New JDBC CSV Driver to read large CSV file (greater than 1Go) without performance issue

Guillaume de GENTILE Thu, 17 Jan 2019 05:46:53 -0800

Hi all,

First of all I am an h2 fan since years. 
Its embedded CSVREAD function is fantastic for reading CSV or flat files, 
however it has some limitations:


   - Bad performance when dealing with large CSV files (more than 1Go) 
      - I have already played with the optimizations parameters, but the 
      problem remains.
      - Bad performance when ordering large data using multiple columns in 
   “ORDER BY” 
      - Yes, it is possible to create some indexes to improve the 
      performances.
      - No support for wildcard expression in filename pattern (in case we 
   need to load all CSV files from an existing folder) 
   - No cache management (do not re-evaluate the CSVREAD if the underlying 
   csv file is not amended)


Based on the above statements, I have developed my own CSV JDBC driver 
using an in-memory Column Store database in the background.
It uses the same syntax as for h2 (select * from CSVREAD(...)).

*Benefits are:*

   - Outstanding performance when dealing with large data (more than 1Go)
   - Outstanding performance when ordering multiple columns on large files 
   (more than 1Go)
   - Support for wildcard expression in the CSVREAD (it is possible to read 
   all files contained in a specific folder in a row)
   - Embedded cache management, the system will use the cache if the 
   underlying file is not amended


I had some cases where I was not able to use H2 to read large CSV files due 
to very bad performances.
This driver is using an In-Memory Column Store database in the background 
which is much efficient for storing large data and also for manipulating 
data (ex: ordering multiple columns has minimum impact on performance)

Regards,
Guillaume

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] CSVREAD - New JDBC CSV Driver to read large CSV file (greater than 1Go) without performance issue

Reply via email to