Well astonishing!!! I have reproduced my test on my pc with OptimizeIt :) here are the results: loaded 18,000 lines into a table of 60 columns (char(5) not null) (always the same value: 'AAAAA') => takes over 300Mb
I've checked the place where the makority of String where created: => over 21,000 lines for only 52Mb !!! the only change I've made is in the class Tokenizer#getString(char quoteChar)... 910 return new String(chBuffer, 0, j).intern(); the "internt()" method... I really don't know what to think about this modification: is it a patch or not? Personnaly, I think I'll make more test monday at my office :) Loic ps: I've just seen the comment of fredt: // fredt - strings are constructed from new char[] objects to avoid slack // because these strings might end up as part of internal data structures // or table elements. --> // we may consider using pools to avoid recreating the strings -----Message d'origine----- De : wondersonic [mailto:[EMAIL PROTECTED] Envoyé : vendredi 4 juillet 2003 21:40 À : fredt Objet : RE: [Hsqldb-developers] Volunteer Okay, I had imagined the data where stored on something like: Object[] columns where columns[i] is of type Object[] if the column is nullable or simply short[] or int[] or <scalar-type>[] or String[] when the column is not nullable. That would have used less memory (at least less than by using the wrapper). I'll take a look at Index.java. And I'll try to change the cache_scale property, maybe it could help yes :-) My first advice about hsqldb would be for the batch buildXXX.bat scripts (although I know there is an ant script). I suggest you to surround the classpath parameter with "" like: javac -O -nowarn -d ../classes -classpath "%classpath%;../classes;../lib/servlet.jar;." ./*.java ... I've encoutered some errors with my own classpath (because of spaces). -----Message d'origine----- De : fredt [mailto:[EMAIL PROTECTED] Envoyé : vendredi 4 juillet 2003 21:23 À : wondersonic Objet : Re: [Hsqldb-developers] Volunteer Thanks Loic, That explains it. The data for each row is stored as an Object[] with the length equal to the number of columns. Null fields have null values in the array, others have a reference to an Integer, String, Date, or other Object. The rows are linked together by indexes. Each table has at least one index (the user or system-defined primary key). Look at Index.java to see how the engine searches for a given row in a table. I think your success will depend on the field data in your table(s) being repetitive and able to benefit from the value pool. RE cached tables, when you have a lot of memory, you can increase the cache_scale to the maximum (or even modify the source to allow a value beyond the maximum). Fred ----- Original Message ----- From: "wondersonic" <[EMAIL PROTECTED]> To: "fredt" <[EMAIL PROTECTED]> Sent: 04 July 2003 20:08 Subject: RE: [Hsqldb-developers] Volunteer Sounds great too! I'll really be pleased to help you by testing the 1.7.2 alpha release with our environment. I'm currently reading the source code of HSQLDB and well perhaps could I make some little remarks, to improve some (very little) things? About my remark x10 to x50... The new consolidation system of the bank I'm working on receives csv files. from some lines to 1,000,000 and over. Once this file has been loaded, treatments are applied to the lines. One of these treatment uses other data to generate new lines from ont of the original lines, say from 10 to 50 lines for one original line. After that, we need to agregate those newly created lines plus the original ones according to certain columns/keys. The agregate treatment needs all the lines be loaded at the same time into memory => a huge amount of memory is required. One of my ideas was to use cached tables but then the time needed for the whole process grows... from 5 minutes to 23 minutes for 800,000 lines and you have to know that the treatments must take at most 2 hours! Could you confirm me that the data of a table are stored by column and that one column is managed with a Vector or an object that implement List or extends Vector, ArrayList or anything else? -----Message d'origine----- De : fredt [mailto:[EMAIL PROTECTED] Envoyé : vendredi 4 juillet 2003 20:50 À : wondersonic Objet : Re: [Hsqldb-developers] Volunteer Thanks Loic, This sounds great. We are gradually approching these requirements. The plan for 1.7.2 includes configurable object caches which will reduce memory use in all modes (MEMORY, CACHED, TEXT). Apart from that, the main goal for 1.7.2 that I set at the beginning of the development cycle was reliability. I think this has been mostly achieved by rewriting the persistence mechanisms and by the time 1.7.2 is released we should be able to iron out the details. Regarding speed, both the prepared statement work and the object pool implementation will help. The release plan is to get the ALPHA_N out in the next few days. The Release Candidate should follow soon and the final release after a few weeks. You can help a lot now in the areas of testing and quality assurance using your specific dataset and if there is anything that can be done within this limited timeframe to improve the engine, we will do that. BTW. I don't know what your mean by 'tables .... that can produce more data x10 to x50' please explain further. Regards Fred Toussi ----- Original Message ----- From: "wondersonic" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: 04 July 2003 19:18 Subject: [Hsqldb-developers] Volunteer Hello, Since the last month, I have begun to work on a new project for the 1st french bank (BNP PARIBAS) that need to manage large amount of data (tables with 50-70 columns with over 1,000,000 of rows that can produce more data x10 to x50, the servers will have 4Gb of memory but must support parallel processing: at least the pre-defined amount of data plus 500Mb for other little processing). The currently selected solution to handle the treatments is the HSQLDB database. I've got a background of 4 years in Java and I've got some time. So here are my needs: - Handle data with the minimum amount of memory - Have a rock solid datatabase - And fast if possible ;-) What are yours? Thanks for your answer(s), Loic Lefevre ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers
TestHSQLDB.java
Description: Binary data