Hi H2 team,
first I would like to thank you for the excellent database you provide, I
am a big fan of H2 since years and I have always promoted H2 through my
different projects (it was easy to convince people to use h2 by the way ;).
The feature I like the most is the CSVREAD function which is amazing and
very useful.
Being able to manipulate agnostic data (file) like if it is structured data
using SQL is great and very flexible.
*I have 2 comments:*
- H2 has some limitations:
- Bad performance when dealing with large CSV files (more than 1Go)
- I have already played with the optimizations parameters, but the
problem remains.
- Bad performance when ordering large data using multiple columns
in “ORDER BY”
- Yes, it is possible to create some indexes to improve the
performances.
- No support for wildcard expression in filename pattern (in case
we need to load all CSV files from an existing folder)
- No cache management (do not re-evaluate the CSVREAD if the
underlying csv file is not amended)
- The requirement to read agnostic data is not limited to CSV and
also applies to other file format (Excel, JSON, PDF, XML, Properties, ...).
Based on the above statements, I have developed my own JDBC drivers:
- CSV JDBC Driver
- Excel JDBC Driver to read both XSLX and XLS files
- Excel XML SpreadSheet JDBC Driver (a schema is used to do the binding
between XPATH and Columns)
- JSON JDBC Driver
- PDF JDBC Driver
- Properties JDBC Driver
- SSH JDBC Driver: SSH connection is very similar to database connection
(user, password, host, port and folder instead of database)
- This driver can be used to read all files (files attributes: name,
relative path, full path, checksum, size, rights,...) in folders (both
filename and folder support patterns)
- SWIFT JDBC Driver
- XML JDBC Driver
All above drivers (except the CSV driver) are using and H2 database in the
background to store data (because h2 is great).
I had to develop my own CSV driver in order to solve the performance issue
I get with H2 when dealing with large data.
I have used an in-memory Column store database in the background and I have
outstanding performance when dealing with large data (I love H2, but it is
no match when dealing with large data or if you need to read multiple CSV
files in a row).
I might not be the first to have these requirements and I would like to
know if you would be interested to exchange with me on this topic or if you
know other persons who might be interested ?
These drivers are key feature for data integration, automation,
inter-system communication, non-regression testing and more...
For your information, these drivers are not prototypes but are used in
production.
Best regards,
Guillaume de GENTILE
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.