Hi H2 team,

first I would like to thank you for the excellent database you provide, I 
am a big fan of H2 since years and I have always promoted H2 through my 
different projects (it was easy to convince people to use h2 by the way ;).
The feature I like the most is the CSVREAD function which is amazing and 
very useful.
Being able to manipulate agnostic data (file) like if it is structured data 
using SQL is great and very flexible.

*I have 2 comments:*

   - H2 has some limitations: 
      - Bad performance when dealing with large CSV files (more than 1Go) 
         - I have already played with the optimizations parameters, but the 
         problem remains.
         - Bad performance when ordering large data using multiple columns 
      in “ORDER BY” 
         - Yes, it is possible to create some indexes to improve the 
         performances.
         - No support for wildcard expression in filename pattern (in case 
      we need to load all CSV files from an existing folder) 
      - No cache management (do not re-evaluate the CSVREAD if the 
      underlying csv file is not amended)
      - The requirement to read agnostic data is not limited to CSV and 
   also applies to other file format (Excel, JSON, PDF, XML, Properties, ...). 


Based on the above statements, I have developed my own JDBC drivers:

   - CSV JDBC Driver 
   - Excel JDBC Driver to read both XSLX and XLS files
   - Excel XML SpreadSheet JDBC Driver (a schema is used to do the binding 
   between XPATH and Columns)
   - JSON JDBC Driver 
   - PDF JDBC Driver 
   - Properties JDBC Driver 
   - SSH JDBC Driver: SSH connection is very similar to database connection 
   (user, password, host, port and folder instead of database) 
      - This driver can be used to read all files (files attributes: name, 
      relative path, full path, checksum, size, rights,...) in folders (both 
      filename and folder support patterns)
      - SWIFT JDBC Driver 
   - XML JDBC Driver 

All above drivers (except the CSV driver) are using and H2 database in the 
background to store data (because h2 is great).
I had to develop my own CSV driver in order to solve the performance issue 
I get with H2 when dealing with large data.
I have used an in-memory Column store database in the background and I have 
outstanding performance when dealing with large data (I love H2, but it is 
no match when dealing with large data or if you need to read multiple CSV 
files in a row).

I might not be the first to have these requirements and I would like to 
know if you would be interested to exchange with me on this topic or if you 
know other persons who might be interested ?
These drivers are key feature for data integration, automation, 
inter-system communication, non-regression testing and more...

For your information, these drivers are not prototypes but are used in 
production.

Best regards,
Guillaume de GENTILE

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Reply via email to