Guillaume, how are you? > Well, the life cycle of my data would mainly be archival and data lookup on conditions > (like "retrieve every row where column B equals 12", much like what pyTables and MongoDB > can do on queries). > 1 sample can be from 4 bytes to 256 bytes (strings), it can be int, double or strings or else.
It's tempting to have the same storage layout for acquisition, archival, and retrieval, and in simple cases this might even work. Generally, it's not always such a great idea. > If I understand correctly what you are saying, you think that there are not really sense > of using MongoDB for time series as there would be not really sense of using HDF5 > for storing documents? That's one way of putting it. You can obviously mimic storing documents in HDF5, the same way you can mimic storing time series in MongoDB. And mimicking is good enough, sometimes. It really depends on what your expectations for quality are. > Where can I find examples for FastBit? Here's a quotation from John Wu's earlier posting: "Both FastQuery and FastBit are available in source code form FastQuery http://codeforge.lbl.gov/projects/fastquery FastBit http://codeforge.lbl.gov/projects/fastbit Feel free to join FastBit mailing list <https://hpcrdm.lbl.gov/pipermail/fastbit-users> to post your questions regarding FastBit and FastQuery." Best, G. -----Message d'origine----- De : Hdf-forum [mailto:[email protected]] De la part de Gerd Heber Envoyé : dimanche 24 février 2013 18:18 À : 'HDF Users Discussion List' Objet : Re: [Hdf-forum] mongodb compared to HDF5 ? Guillaume, how are you? This is an interesting question, but there're several omissions and assumptions that make it rather ill-posed. The omissions have to do with what you didn't tell us (and I come back to that in a moment). The assumptions have to do with an unspecified base on which HDF5 and MongoDB are comparable. (I will not spend time to discuss this second point and only state that, apart from trivial situations, there is no basis for such a comparison. HDF5 and MongoDB are two very different animals, which raises several interesting possibilities of using them together. More on that soon...) In any event, I suggest you spend some quality time with both candidates. Have a look at PyTables, install MongoDB, and kick the tires. For prototyping, both are fun to play with. For a production solution, you need to ask and answer many more questions. My first question for you would be, 'What's the data life cycle of your data?' You told us something about the acquisition, then what? (cleaning, transformation, products, distribution, (re-)use, archival, any of those?) What about the underlying model and the metadata that go with that? At the indicated rate, you'll acquire about 216 million samples in 10 hours. What's the size of an individual sample? How similar are individual samples? By 'similar' I mean structure and value, i.e., how compressible are they? Are they strings, or numbers disguised as strings? How many JSON/BSON documents were you thinking about? (MongoDB's current BSON document size limit is 16MB.) Do you need MongoDB sharding across instances on EC2? How will your acquisition rate change in the future? (It for sure will go up...) How do you access the data? What are the interface constraints of your clients? In terms of raw read/write performance, I don't see a scenario where MongoDB has a chance to beat HDF5. This doesn't mean that MongoDB couldn't be sufficient for your purposes. MongoDB lets you create indexes out-of-the-box. Plain HDF5 has no such mechanism built-in. (PyTables does and there are add-ons for HDF5 such as FastBit.) These are just a few pointers for your homework. Keep us posted on how you're getting on! My parting comment would be this: If you're after building a long-term archive of large time series data, the idea of using MongoDB strikes me as rather silly. It wasn't made for that, it's a document database, remember? On the other hand, using MongoDB as the catalog for metadata and to publish time series excerpts and aggregates is a perfectly sensible and efficient solution. Best, G. From: Hdf-forum [mailto:[email protected]] On Behalf Of guillaume Sent: Saturday, February 23, 2013 2:20 PM To: [email protected] Subject: [Hdf-forum] mongodb compared to HDF5 ? Hi everyone, I'm trying to find the best fit for time series data (a lot, let's say 1 sample every 10 ms for 10 hours which are never updated only added and then read back) and I'd like your opinion on mongodb compared to HDF5. Which one is the best fit ? Which one is the more performant ? Any other pros/cons for one or the other ? Thanks a lot, Guillaume. ________________________________________ View this message in context: mongodb compared to HDF5 ? Sent from the hdf-forum mailing list archive at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
