Hi Gerd,

I did took a look at pyTables and MongoDB, it is indeed fun !

Well, the life cycle of my data would mainly be archival and data lookup on 
conditions (like "retrieve every row where column B equals 12", much like what 
pyTables and MongoDB can do on queries).
1 sample can be from 4 bytes to 256 bytes (strings), it can be int, double or 
strings or else.

Sharding is sure a plus that MongoDB has "out-of-the-box" but it's not really 
mandatory.

Acquisition rate may not really go up, it's mainly samples from sensors and 
having to capture at a higher rate would not really have any sense.

If I understand correctly what you are saying, you think that there are not 
really sense of using MongoDB for time series as there would be not really 
sense of using HDF5 for storing documents?

Where can I find examples for FastBit?

Thank,
Guillaume.

-----Message d'origine-----
De : Hdf-forum [mailto:[email protected]] De la part de Gerd Heber
Envoyé : dimanche 24 février 2013 18:18
À : 'HDF Users Discussion List'
Objet : Re: [Hdf-forum] mongodb compared to HDF5 ?


Guillaume, how are you? This is an interesting question, but there're several 
omissions and assumptions that make it rather ill-posed.

The omissions have to do with what you didn't tell us (and I come back to that 
in a moment).
The assumptions have to do with an unspecified base on which HDF5 and MongoDB 
are comparable.
(I will not spend time to discuss this second point and only state that, apart 
from trivial situations, there is no basis for such a comparison. HDF5 and 
MongoDB are two very different animals, which raises several interesting 
possibilities of using them together. More on that soon...)

In any event, I suggest you spend some quality time with both candidates.
Have a look at PyTables, install MongoDB, and kick the tires. For prototyping, 
both are fun to play with. For a production solution, you need to ask and 
answer many more questions.

My first question for you would be, 'What's the data life cycle of your data?'
You told us something about the acquisition, then what? (cleaning, 
transformation, products, distribution, (re-)use, archival, any of those?) What 
about the underlying model and the metadata that go with that?

At the indicated rate, you'll acquire about 216 million samples in 10 hours.
What's the size of an individual sample? How similar are individual samples?
By 'similar' I mean structure and value, i.e., how compressible are they?
Are they strings, or numbers disguised as strings?

How many JSON/BSON documents were you thinking about?
(MongoDB's current BSON document size limit is 16MB.) 

Do you need MongoDB sharding across instances on EC2?

How will your acquisition rate change in the future? (It for sure will go
up...)
How do you access the data? What are the interface constraints of your clients?

In terms of raw read/write performance, I don't see a scenario where MongoDB 
has a chance to beat HDF5. This doesn't mean that MongoDB couldn't be 
sufficient for your purposes.

MongoDB lets you create indexes out-of-the-box. Plain HDF5 has no such 
mechanism built-in.
(PyTables does and there are add-ons for HDF5 such as FastBit.)

These are just a few pointers for your homework. Keep us posted on how you're 
getting on! 

My parting comment would be this: If you're after building a long-term archive 
of large time series data, the idea of using MongoDB strikes me as rather silly.
It wasn't made for that, it's a document database, remember?
On the other hand, using MongoDB as the catalog for metadata and to publish 
time series excerpts and aggregates is a perfectly sensible and efficient 
solution.

Best, G.






From: Hdf-forum [mailto:[email protected]] On Behalf Of guillaume
Sent: Saturday, February 23, 2013 2:20 PM
To: [email protected]
Subject: [Hdf-forum] mongodb compared to HDF5 ?

Hi everyone, I'm trying to find the best fit for time series data (a lot, let's 
say 1 sample every 10 ms for 10 hours which are never updated only added and 
then read back) and I'd like your opinion on mongodb compared to HDF5. Which 
one is the best fit ? Which one is the more performant ? Any other pros/cons 
for one or the other ? Thanks a lot, Guillaume. 
________________________________________
View this message in context: mongodb compared to HDF5 ?
Sent from the hdf-forum mailing list archive at Nabble.com.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to