If I have a list [a], and I want to make that persistence, then I have to have some way to serialize values of type 'a'. If I then modify my type, then the serialized structure will be out of sync with the new version of the type -- so I will need some sort of migration feature.
safecopy addresses both the issues of serializing the data and migrating it when the datastructure changes: http://hackage.haskell.org/package/safecopy You should definitely consider using that. When it comes to concurrency.. my big question is how do you plan to deal with transaction boundaries / atomicness. For example, if each function (like, filter, map, etc) is atomic. That doesn't mean you have something atomic when you do: filter pred =<< map f l something could sneak in between the 'map' and the 'filter'. An obviously solution would be to do something like: transaction $ filter pred =<< map f l Which could mean that the datastore would have to be locked until that entire operation is done? Also.. what does it mean to have a 'persistent list'. In that example, is map destructive? Does it modify the list ? Or does it produce a new list? A somewhat related system is, of course, acid-state (formerly happstack-state). The solution there is pretty simple, and somewhat more flexible. To write code you basically just use the State monad. You can store just about any types you want and use just about any functions you want. To get and update the state you just use get/set. simpleTransaction do l <- get let l' = filter pred (map f l) put l' return l' That updates the list and returns the modified list as well. To make that into a transaction we use a bit of template-haskell to mark it is a transaction $(makeAcidic ''MyDatabase ['simpleTransaction]) The appeal of this solution is that you are not limited to just a List or Map or whatever types people have bother to import into the system. If you decide you want to use Data.Tree you need only do: $(deriveSafeCopy 1 'base ''Tree) And now you can use it persistently and concurrently as well. You do not have to recreate every function in Data.Tree. Still, I can see the appeal of just being able to import NData.Map, deriving a serialize instance for your data, and start writing very normal looking code. There is something very nice about just being able to use a function like 'transaction' to mark the boundaries of a transaction rather than having to give the transaction and name and call some template haskell function. Using acid-state, it would be very easy to implement a persistent version of Data.Map where each function is atomic. However, there is currently no way to group multiple events into a single transaction. Though I think I can imagine how to add such a feature. Of course, the idea of having a big lock blocking everything is not very appealing. But as an experimental fork it could be interesting.. But, first I would like to hear more about how you imagined transactions would actually work in the first place.. The big issue I see is that transactions can be a real performance problem. If I write code for a Map-like persistent structure: transaction $ do v <- lookup "key" pmap v' <- doSomethingExpensive v insert v pmap That is going to really lock things up, since nothing else can happen while that transaction is running? Still, it sounds interesting.. just not easy :) I would definitely encourage you to consider safecopy at the very least. It is completely independent of acid-state. It is simply a fast versioned data serialization library. - jeremy On Tue, Nov 1, 2011 at 5:31 PM, dokondr <[email protected]> wrote: > Hi, > Please comment on the idea and advise on steps to implement it. > Real world applications need persistent data, that can be accessed and > modified concurrently by several clients, in a way that preserves > "happen-before" relationship. > Idea: Design and implement Persistent Concurrent Data Types in Haskell. > These data types should mirror existing Data.List , Data.Map and similar > types but provide persistency and support consistent concurrent access and > modification (or simply - "concurrency"). > Persistency and concurrency should be configurable through these type > interfaces. Configuration should include: > 1) Media to persist data, such as file, DBMS, external key-value store (for > example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc) > 2) Caching policy - when (on what events) and how much data to read/write > from/to persistent media. Media reads / writes can be done asynchronously in > separate threads. > 3) Concurrency configuration: optimistic or pessimistic data locking. > > One may ask why encapsulate persistency and concurrency in the data type > instead of using "native" storage API, such as for example key-value / > row-column API that NoSQL databases provide? > The answer is simple: APIs that your code use greatly influence the code > itself. Using low-level storage API directly in your code results in > bloated obscure code, or you need to encapsulate this low-level API in clear > and powerful abstractions. So why not to do this encapsulation once and for > all for such powerful types as Data.Map, for example, and forget all > Cassandra and SimpleDB low-level access method details? > When the right time comes and you will need to move your application to the > next new "shiny_super_cloud", you will just write the implementation of > NData.Map backed by Data.Map in terms of low-level API of this super-cloud. > > (Side note: I really need such a NData.Map type. I was requested to move my > code that heavily uses Data.Map and simple text file persistence into Amazon > AWS cloud. Looking at SimpleDB API, I realized that I will have to rewrite > 90% of code. This rewrite will greatly bloat my code and will make it very > unreadable. In case I had NData.Map I would just switch implementation from > 'file' to SimpleDB persistency inside my NData.Map type.) > > Implementation: > To start playing with this idea, NData.Map persisted in a regular file will > do, no concurrency yet. Next step - NData.Map persisted in SimpleDB or > Cassandra or Redis, with concurrent access supported. > > So it looks like NData.Map should be a monad ... > Any ideas on implementation and similar work? > > Thanks! > Dmitri > --- > http://sites.google.com/site/dokondr/welcome > > > > _______________________________________________ > Haskell-Cafe mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
