TR: [Hsqldb-developers] Volunteer

wondersonic Fri, 04 Jul 2003 13:37:46 -0700

Well astonishing!!!
I have reproduced my test on my pc with OptimizeIt :)
here are the results:
loaded 18,000 lines into a table of 60 columns (char(5) not null)
(always the same value: 'AAAAA')
=> takes over 300Mb


I've checked the place where the makority of String where created:
=> over 21,000 lines for only 52Mb !!!

the only change I've made is in the class Tokenizer#getString(char
quoteChar)...
 910                 return new String(chBuffer, 0, j).intern();

the "internt()" method...

I really don't know what to think about this modification: is it a patch
or not?

Personnaly, I think I'll make more test monday at my office :)

Loic

ps: I've just seen the comment of fredt:

    // fredt - strings are constructed from new char[] objects to avoid
slack
    // because these strings might end up as part of internal data
structures
    // or table elements.
--> // we may consider using pools to avoid recreating the strings

-----Message d'origine-----
De : wondersonic [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 4 juillet 2003 21:40
À : fredt
Objet : RE: [Hsqldb-developers] Volunteer


Okay, I had imagined the data where stored on something like:

Object[] columns where columns[i] is of type Object[] if the column is
nullable
or simply short[] or int[] or <scalar-type>[] or String[] when the column is
not nullable.
That would have used less memory (at least less than by using the wrapper).

I'll take a look at Index.java. And I'll try to change the cache_scale
property, maybe it could help
yes :-)

My first advice about hsqldb would be for the batch buildXXX.bat scripts
(although I know there is an ant script).
I suggest you to surround the classpath parameter with "" like:
javac -O -nowarn -d ../classes -classpath
"%classpath%;../classes;../lib/servlet.jar;." ./*.java ...
I've encoutered some errors with my own classpath (because of spaces).


-----Message d'origine-----
De : fredt [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 4 juillet 2003 21:23
À : wondersonic
Objet : Re: [Hsqldb-developers] Volunteer


Thanks Loic,

That explains it.

The data for each row is stored as an Object[] with the length equal to the
number of columns. Null fields have null values in the array, others have a
reference to an Integer, String, Date, or other Object.

The rows are linked together by indexes. Each table has at least one index
(the user or system-defined primary key).

Look at Index.java to see how the engine searches for a given row in a
table.

I think your success will depend on the field data in your table(s) being
repetitive and able to benefit from the value pool.

RE cached tables, when you have a lot of memory, you can increase the
cache_scale to the maximum (or even modify the source to allow a value
beyond the maximum).

Fred

----- Original Message -----
From: "wondersonic" <[EMAIL PROTECTED]>
To: "fredt" <[EMAIL PROTECTED]>
Sent: 04 July 2003 20:08
Subject: RE: [Hsqldb-developers] Volunteer


Sounds great too!
I'll really be pleased to help you by testing the 1.7.2 alpha release
with our environment. I'm currently reading the source code of HSQLDB
and well perhaps could I make some little remarks, to improve some
(very little) things?

About my remark x10 to x50...

The new consolidation system of the bank I'm working on receives csv files.
from some lines to 1,000,000 and over. Once this file has been loaded,
treatments
are applied to the lines. One of these treatment uses other data to generate
new
lines from ont of the original lines, say from 10 to 50 lines for one
original line.
After that, we need to agregate those newly created lines plus the original
ones
according to certain columns/keys. The agregate treatment needs all the
lines be loaded
at the same time into memory => a huge amount of memory is required.

One of my ideas was to use cached tables but then the time needed for the
whole process
grows... from 5 minutes to 23 minutes for 800,000 lines and you have to know

that the treatments
must take at most 2 hours!


Could you confirm me that the data of a table are stored by column and that
one column is managed with
a Vector or an object that implement List or extends Vector, ArrayList or
anything else?



-----Message d'origine-----
De : fredt [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 4 juillet 2003 20:50
À : wondersonic
Objet : Re: [Hsqldb-developers] Volunteer


Thanks Loic,

This sounds great. We are gradually approching these requirements. The plan
for 1.7.2 includes configurable object caches which will reduce memory use
in all modes (MEMORY, CACHED, TEXT). Apart from that, the main goal for
1.7.2 that I set at the beginning of the development cycle was reliability.
I think this has been mostly achieved by rewriting the persistence
mechanisms and by the time 1.7.2 is released we should be able to iron out
the details.

Regarding speed, both the prepared statement work and the object pool
implementation will help.

The release plan is to get the ALPHA_N out in the next few days. The Release
Candidate should follow soon and the final release after a few weeks. You
can help a lot now in the areas of testing and quality assurance using your
specific dataset and if there is anything that can be done within this
limited timeframe to improve the engine, we will do that.


BTW. I don't know what your mean by 'tables .... that can produce more data
x10 to x50' please explain further.

Regards

Fred Toussi

----- Original Message -----
From: "wondersonic" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: 04 July 2003 19:18
Subject: [Hsqldb-developers] Volunteer


Hello,
Since the last month, I have begun to work on a new project for the 1st
french bank (BNP PARIBAS) that need to manage large amount of data
(tables with 50-70 columns with over 1,000,000 of rows that can produce
more data x10 to x50, the servers will have 4Gb of memory but must support
parallel processing: at least the pre-defined amount of data plus 500Mb for
other little processing). The currently selected solution to handle the
treatments
is the HSQLDB database.

I've got a background of 4 years in Java and I've got some time.

So here are my needs:
- Handle data with the minimum amount of memory
- Have a rock solid datatabase
- And fast if possible ;-)

What are yours?

Thanks for your answer(s),
Loic Lefevre



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
hsqldb-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/hsqldb-developers

TestHSQLDB.java
Description: Binary data

TR: [Hsqldb-developers] Volunteer

Reply via email to