Hi all,
First of all I am an h2 fan since years.
Its embedded CSVREAD function is fantastic for reading CSV or flat files,
however it has some limitations:
- Bad performance when dealing with large CSV files (more than 1Go)
- I have already played with the optimizations parameters, but the
problem remains.
- Bad performance when ordering large data using multiple columns in
“ORDER BY”
- Yes, it is possible to create some indexes to improve the
performances.
- No support for wildcard expression in filename pattern (in case we
need to load all CSV files from an existing folder)
- No cache management (do not re-evaluate the CSVREAD if the underlying
csv file is not amended)
Based on the above statements, I have developed my own CSV JDBC driver
using an in-memory Column Store database in the background.
It uses the same syntax as for h2 (select * from CSVREAD(...)).
*Benefits are:*
- Outstanding performance when dealing with large data (more than 1Go)
- Outstanding performance when ordering multiple columns on large files
(more than 1Go)
- Support for wildcard expression in the CSVREAD (it is possible to read
all files contained in a specific folder in a row)
- Embedded cache management, the system will use the cache if the
underlying file is not amended
I had some cases where I was not able to use H2 to read large CSV files due
to very bad performances.
This driver is using an In-Memory Column Store database in the background
which is much efficient for storing large data and also for manipulating
data (ex: ordering multiple columns has minimum impact on performance)
Regards,
Guillaume
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.