2011/1/5 jelle feringa <jelleferi...@gmail.com> > Guys, I'm not much of a DB expert, but a NoSQL db like cassandra is really > built specifically for web 2.0 realtime applications with trillions of > users. Query's are ran by code rather than sql statement and the DB is > reduced to storing / retrieving key/values ( on a large number of machines > ).
I'm not an expert as well, but I'm not sure that, as you write, "cassandra is really built specifically for web 2.0 realtime applications with trillions of users". This point does not appear on the projects highlights. It has been designed to be scalable, durable and fault tolerant, which is quite different. As far as I understood, the choice of Cassandra by Twitter is not the result of the millions of users nor the size of the database, but rather the exponential increase of size and number of users. This problem (how to stay efficient in a rapid growing environment) is, in my opinion, independent from web2.0 or real time applications. Furthermore, although Cassandra (or CouchDB or Mango etc.) provide low latency for syncing/writing/reading, they can't be considered as "real time". > I'm pretty confident its _not_ practical for our sort of purpose. > This assertion depends on the "sort of purpose", and I'm not as confident as you are. > I'm pretty sure it would be more productive to first mess about with > something compact and efficient such as sqlite ( or whatever relational db > ;) > It's another aspect. If by "productive", you mean that it would be quicker to develop such a solution, I agree. You posted a draft python code a few days ago, that does work, and could be extended so that it covers the complete API. On the other side, although I'm not an expert in DB systems, the SQL based databases used in product data management has proven serious drawbacks. You don't need to reach trillion of users to get the system fail, only a few hundreds located all around the world are sufficient. I recently had a feedback of an engineer working for the automotive industry: they have many factories all around the world, with one PDM for the whole company. The main server is located in Europe, and there is one server per continent mirroring the main one. Replications are performed every one or two days, that is to say that every engineer might work on outdated data. It's really a big deal, for the collaborative work to be efficient, and also from an IT architecture maintenance and cost. And they are not trillions of users. You might also have heard about the use of HDF5 to manage STEP data. One simple user can create huge sets of CAD data in only one session. Maybe the distributed databases like Cassandra will be an alternative solution to HDF5. So the SQL solution would be a short term vision. Distributed and scalable databases a long term vision. At last, I didn't get through the ORM mapper because I'm not interested in doing something that has already be done (there is no real challenge) and using technologies which are almost as old as me ;-) > Than again, what the hell do I know. > Please refer to this wiki entry <http://en.wikipedia.org/wiki/NoSQL> so > you understand what schema-less DB's are good for. > Why not rely use a decent ORM like sqlalchemy? > In my opinion, the discussion can't be as deeply technical untill we have discussed what you call the "purpose". According to me : - the 'CAD Collaborative Work' is a *really* important part of the high level cad api that has to be designed. It is at the same level than Creating/Visualizing CAD objects like shapes, vertices, splines etc. Both has to be designed at the same time; - the Geometrical/Topological/etc. part of the API has to be independent from the underlying CAD kernel. For instance, a pythonOCC user working with this high level API must *never* see any TopoDS_Vertex, BRepPRimAPI_Something and all that mess ; - the CAD collaborative Work must be independent from any database technology and the user must *never* see any Oracle/MySQL/NoSQL/what else database. The user just need to store/retrieve/find CAD objects whether they are stored in a python dict (in memory), in a text or xml file, in a local or distant db etc. To conclude, I think a well designed HLA must be 'user oriented' instead of 'technology oriented'. Let's focus on user needs, and implement after that different technologies. That's what I called a few days ago a top-down approach : go from user to implementation rather than technology to user. >From this point of view, the use of any technology remains possible. > > Cheers, > Cheers! > > -jelle > Jelle
_______________________________________________ Pythonocc-users mailing list Pythonocc-users@gna.org https://mail.gna.org/listinfo/pythonocc-users