On 3 July 2012 20:38, Johan Tibell <johan.tib...@gmail.com> wrote: > On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts > <duncan.cou...@googlemail.com> wrote: >> Something to keep in mind is memory usage. I know Jeremy is looking at >> this from the infrastructure side, but I think from the app side there's >> also some likely culprits. Cabal's GenericPackageDescription type is >> very large in memory. Having 10's of 1000's of these means lots of >> memory. One hopefully easy way to save memory here without going to the >> hassle of redoing Cabal's type definitions is simply to increase >> sharing. There's a huge amount of repeated information. Start by sharing >> all the package names and versions. Then there's other meta-data that >> rarely changes between versions of the same package. This kind of thing >> should be easy to evaluate, just write a test prog that reads the index >> file and look at peak memory use. Then try sharing stuff and see how >> much it drops. This sharing optimisation would still be useful even if >> later we go and redo GenericPackageDescription to be more compact. > > This should not hold up the launch of Hackage 2 (which is very > important) but I think it's an important issue that we need to > address: we don't want to store the perhaps most important data the > Haskell community has in an experimental data store! Creating a > correct data store (i.e. ACID) that also handles a moderate amount of > load is a quite difficult undertaking and it shouldn't be taken > lightly. Lets stick the data in some SQL database and spend our energy > on other things. :)
I still disagree that going with an external SQL db will be easier. The big advantage of the acid-state (and similar) data stores is that they let us use Haskell types properly and don't imply a separate external data model and a marshalling stage. That said, I also do not trust acid-state for long term storage (simply because the binary format it uses isn't sensible) which is why the hackage server already has a system for dumping and restoring to standard formats (like csv, tarballs etc). So if we use this backup system properly (ie in combination with a system for backups to other machines) then I think there's little chance of data loss. Additionally, the really important data (the packages) are stored in the file system. Duncan _______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel