Hello,

I am trying to use FastBit in order to index time series records (timestep,
value) that are produced by a scientific simulation. The simulation runs
for 1000 time steps.

The CSV file which is given as input to FastBit is organised as follows:

(0, value0_0)
(1, value0_1)
...
(999, value0_999)
(0, value1_0)
(1, value1_1)
...
(999, value1_999)

...

(0, valueX_0)
(1, valueX_1)
...
(999, valueX_999)

Which means that all the 1000 records of one time series are stored before
the 1000 records of the next time series and so on.

The queries that are asked have the following form:

WHERE t1 < timestep < t2 AND value > v

Overall FastBit seems to be a great fit for this use case.
However, as I am now getting started, I was wondering which is the best way
to use it.

I have three options in mind:

1) Build two indexes, one for the *timestep *and one for the *value.*
2) Build only a *value* index and simply scan the *timestep* column.
3) Obviously, there is a mapping between the *timestep* and the RIDs.
The *timestep* of a record equals (RID modulo 1000).
So the third option consists of building only a *value* index, computing
the required RIDs according to the *timestep* predicate and using
*ibis::query::setRIDs().*

Which option would be closer to the FastBit philosophy?
I am aiming for a good trade-off between execution time and space
requirements for storing indexes.

I would very much appreciate your feedback,
E. Tzirita Zacharatou
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to