On Wed, Jul 30, 2008 at 02:27:52PM -0400, Oren Laadan wrote: > > > Louis Rilling wrote: > >> +/** > >> + * cr_vma_fill_pgarr - fill a page-array with addr/page tuples for a vma > >> + * @ctx - checkpoint context > >> + * @pgarr - page-array to fill > >> + * @vma - vma to scan > >> + * @start - start address (updated) > >> + */ > >> +static int cr_vma_fill_pgarr(struct cr_ctx *ctx, struct cr_pgarr *pgarr, > >> + struct vm_area_struct *vma, unsigned long *start) > >> +{ > >> + unsigned long end = vma->vm_end; > >> + unsigned long addr = *start; > >> + struct page **pagep; > >> + unsigned long *addrp; > >> + int cow, nr, ret = 0; > >> + > >> + nr = pgarr->nleft; > >> + pagep = &pgarr->pages[pgarr->nused]; > >> + addrp = &pgarr->addrs[pgarr->nused]; > >> + cow = !!vma->vm_file; > >> + > >> + while (addr < end) { > >> + struct page *page; > >> + > >> + /* simplified version of get_user_pages(): already have vma, > >> + * only need FOLL_TOUCH, and (for now) ignore fault stats */ > >> + > >> + cond_resched(); > >> + while (!(page = follow_page(vma, addr, FOLL_TOUCH))) { > >> + ret = handle_mm_fault(vma->vm_mm, vma, addr, 0); > >> + if (ret & VM_FAULT_ERROR) { > >> + if (ret & VM_FAULT_OOM) > >> + ret = -ENOMEM; > >> + else if (ret & VM_FAULT_SIGBUS) > >> + ret = -EFAULT; > >> + else > >> + BUG(); > >> + break; > >> + } > >> + cond_resched(); > >> + } > > > > I guess that 'ret' should be checked somewhere after this loop. > > yes; this is where a "break(2)" construct in C would come handy :)
Alternatively, putting the inner loop in a separate function often helps to handle errors in a cleaner way. > > > > >> + > >> + if (IS_ERR(page)) { > >> + ret = PTR_ERR(page); > >> + break; > >> + } > >> + > >> + if (page == ZERO_PAGE(0)) > >> + page = NULL; /* zero page: ignore */ > >> + else if (cow && page_mapping(page) != NULL) > >> + page = NULL; /* clean cow: ignore */ > >> + else { > >> + get_page(page); > >> + *(addrp++) = addr; > >> + *(pagep++) = page; > >> + if (--nr == 0) { > >> + addr += PAGE_SIZE; > >> + break; > >> + } > >> + } > >> + > >> + addr += PAGE_SIZE; > >> + } > >> + > >> + if (unlikely(ret < 0)) { > >> + nr = pgarr->nleft - nr; > >> + while (nr--) > >> + page_cache_release(*(--pagep)); > >> + return ret; > >> + } > >> + > >> + *start = addr; > >> + return (pgarr->nleft - nr); > >> +} > >> + > >> +int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t) > >> +{ > >> + struct cr_hdr h; > >> + struct cr_hdr_mm *hh = ctx->tbuf; > >> + struct mm_struct *mm; > >> + struct vm_area_struct *vma; > >> + int ret; > >> + > >> + h.type = CR_HDR_MM; > >> + h.len = sizeof(*hh); > >> + h.id = ctx->pid; > >> + > >> + mm = get_task_mm(t); > >> + > >> + hh->tag = 1; /* non-zero will mean first time encounter */ > >> + > >> + hh->start_code = mm->start_code; > >> + hh->end_code = mm->end_code; > >> + hh->start_data = mm->start_data; > >> + hh->end_data = mm->end_data; > >> + hh->start_brk = mm->start_brk; > >> + hh->brk = mm->brk; > >> + hh->start_stack = mm->start_stack; > >> + hh->arg_start = mm->arg_start; > >> + hh->arg_end = mm->arg_end; > >> + hh->env_start = mm->env_start; > >> + hh->env_end = mm->env_end; > >> + > >> + hh->map_count = mm->map_count; > > > > Some fields above should also be protected with mmap_sem, like ->brk, > > ->map_count, and possibly others (I'm not a memory expert though). > > true; keep in mind, though, that the container will be frozen during > this time, so nothing should change at all. The only exception would > be if, for instance, someone is killing the container while we save > its state. Sure. So you think that taking mm->mmap_sem below is useless? I tend to believe so, since no other task should share this mm_struct at this time, and we could state that ptrace should not interfere during restart. However, I'm never confident when ptrace considerations come in... > > > > >> + > >> + /* FIX: need also mm->flags */ > >> + > >> + ret = cr_write_obj(ctx, &h, hh); > >> + if (ret < 0) > >> + goto out; > >> + > >> + /* write the vma's */ > >> + down_read(&mm->mmap_sem); > >> + for (vma = mm->mmap; vma; vma = vma->vm_next) { > >> + if ((ret = cr_write_vma(ctx, vma)) < 0) > >> + break; > >> + } > >> + up_read(&mm->mmap_sem); > >> + > >> + if (ret < 0) > >> + goto out; > >> + > >> + ret = cr_write_mm_context(ctx, mm); > >> + > >> + out: > >> + mmput(mm); > >> + return ret; > >> +} > > Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes _______________________________________________ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers _______________________________________________ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel