G'day, I am currently evaluating ceph as a possible filestore for a somewhat more extensive project that could be finished within the next year or so. Therefore I've set up a test scenario with two boxes, (one opensuse 11 32bit and one 64bit, kernel 2.6.25.16-0.1-pae ) and tried to do some tests, mostly from a user's point of view. I just wanted to give you some opinion that might help.
First of all, I liked the way you present the project. Git branches offered, a wiki for docs and papers to get an impression about the underlying tech. The project strikes me as a living and promising thing. Also, I managed to get a small FS on those two boxes running within a relatively small time frame. Compared to openafs which I looked upon before. I couldn't get this bastard to work at all in much longer time. The wiki did good help here. Again, OpenAFS has lots and lots of docs but they are all hopelessly outdated and no help at all. And here is one first thing I was trying to point out. Having good documentation is essential, also for a young project because what young projects need is a userbase. Spreading. And a good way of doing so is making it easy to get into it. By having documentation. Your Wiki is a good help but I think it needs some extenstions. The step by step tutorial style is nice and helped me to get it to run but I haven't got the slightest idea what I did. You tell how to start and setup osds and monmap and cmds and all that but I haven't found any information about what this actually is. The tools itself don't mention it either. Not even a few words in their --help output. I found a bit out by looking at the source but that can't be it. I would offer you help on that front if you need it but since I don't know the answers I would probably not be the right one. Apart from that, I was impressed by the performance. At least the little I could test yet. I was using the unstable branch after I started with the stable one and I experiences some problems. I don't know if they are related though. I have set up one of my two boxes as cmon. That is a daemon that supervises things and doesn't do actual filestoring, right?? Anyway, I created the monmap and all that: monmaptool --create --clobber --add <box1>:12345 --print .ceph_monmap mkmonfs --clobber mondata/mon0 --mon 0 --monmap .ceph_monmap cmon mondata/mon0 -d The osdmap has two include two data stores now, right? osdmaptool --clobber --createsimple .ceph_monmap 2 --print .ceph_osdmap and I did this on the cmon machine: cmonctl osd setmap -i .ceph_osdmap This tells the cmon how many machines have to be supervised? After that I started that data nodes with 1GB files mounted as storage. cosd --mkfs_for_osd 0 testdev (testdev is the file) and on the other machine cosd --mkfs_for_osd 1 testdev Then on both machines cosd testdev -d and cmds -d So far I understood the wiki. After that I mounted the filesystem using mount -t ceph <box1>:/ /mnt/ceph/ -o user And it worked. I could access the file system from both machines now. Now I poured in some stuff and after use tried to shut it down. First unmounting it and then 'killall cmds cosd cmon'. The FS was no longer there but it seemed to be far from gone. The kernel was still causing load and there was disk activity. The ceph kernel module refused to be unloaded ('in use'') and the system load increased bit by bit. Eventually it would reach about 6 or 7 at which point I decided to reboot the machines. Which refused to do so. The system wouldn't let the disks go and so I had to switch them off and back on. After reboot I tried to remount the ceph FS by doing so: On box1: cmon mondata/mon0 -d cmonctl osd setmap -i .ceph_osdmap (don't need to create those map file again, do I?) And then on both boxes cosd testdev -d cmds -d In most cases, the commands were executed but whenever I tried to actually mount the thing again the mount command went into a timeout telling me 'bad superblock' and refusing to mount. I tried many different ways but I never managed to access the data on a ceph FS again after I fist used it and unmounted it. Both journalled und unjournalled and with stable and unstable branch. So did I something wrong? Can you see any obvious fault in what I tried? Can I provide any further information? I assume the way I shut it down must be wrong, effectively destroying the FS and the superblock in the 'testdev' files but how can this be prevented? And how can ceph be used in a bigger test environment when nodes are not always shut down cleanly but occasionally when the maintainance crew accidently knocks one over or some such violent way? Well, maybe you can shed some light on this one. Greetings... Stephan ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel