Re: [Lazarus] data matrix with thousands of columns

Andrea Mauri Wed, 27 Mar 2013 02:24:34 -0700

Il 27/03/2013 09:27, Michael Schnell ha scritto:

On 03/26/2013 03:44 PM, Andrea Mauri wrote:

one more thing. my data is more similar to a huge spreadsheet than a
relational DB


"store":  Do you mean for working with it in "realtime" or for keeping
it for the next time the program is started ?

Ok I will explain better.

I have a GUI app.

The user loads samples (the rows), my app performs calculations onsamples and for every sample give as output thousands of values (thecolumns). Samples could be from tens to hundreds of thousands.

Columns can be tens to thousands.
Every column is an attribute of the sample defined by a unique name.

After calculations the app/user should be able to search for one/moresamples (or for one/more columns) getting all/some values for thesample/column. Briefly I need to be able to rapidly get some values fromthis huge data matrix.


I have two scenarios, both are possible, depending on calculations type:

1. all the columns (the number/type of attributes to be calculated onevery sample) are defined by the user before performing calculations, soI know the total number of columns before starting the calculations.Actually I use a sqlite db that I define before calculations started. Icreate as many tables as I need where every table has 200 fields. Thesamples are stored in a table, every sample is a row. The attributes arestored in all the other tables (every sample has a row in everyattributes tables).

2. The number of columns can vary from a sample to another. The numberof columns/attributes for every sample is known only after calculationis performed on that sample. The total number of columns/attributes isknown only after all samples are examined/calculated.During calculations I need to modify the number of columns. Using sqliteis quite annoying since I need to alter tables when a new column occursalso closing all the queries, modifying the insert queries and so on.During a calculation process thousands of columns can be created andneed to be added.


The app can be splitted in 4 different tasks:

1. load of samples (then the user can look at samples, load new samplesadding to existing ones..);2. calculations (can take seconds, minutes, hours depending on thenumber of samples in the batch and on the number of attributes to becalculated), the user can leave the app working - no interaction, theuser is just able to stop/pause the process;2. calculated data analysis (interactive), the user can perform analysison data (charts, data views, data manipulation [average, standarddeviation...]...) (this should be as fast as possible, data retrievalfrom the db/datasheet fast);3. save of calculated data to a txt file/spreadsheet... (not necessarilyall calculated data will be saved by the user).

The user should also be able to delete/remove samples/columns or addsamples/columns.

When doing a with a 64 bit program it might be possible and appropriate
just to use an array and have the OS do the dirty work of virtual memory
management, using a huge swap partition.

My app is actually compiled both on 32 and 64 bit. Maybe because I amold ;) (I started programming when RAM was to be used carefully, 1Mb ofRAM was typical) but I am inhabited to not put all the data in memory,maybe I need to change my mind. Is it an option to not store on filesbut to take everything in memory and let app and os to perform the dirtywork?



Andrea

--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] data matrix with thousands of columns

Reply via email to