Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

2014-03-27 Thread Olivier Bonvalet
Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit : Olivier reports that with the simple patch I provided (which changed a to a != and removed an assertion) he is running successfully. To me this is fantastic news, and you can see I posted a patch with the fix. There remains a

Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

2014-03-27 Thread Ilya Dryomov
On Thu, Mar 27, 2014 at 9:48 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit : Olivier reports that with the simple patch I provided (which changed a to a != and removed an assertion) he is running successfully. To me this is

Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

2014-03-27 Thread Olivier Bonvalet
Le jeudi 27 mars 2014 à 10:45 +0200, Ilya Dryomov a écrit : On Thu, Mar 27, 2014 at 9:48 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit : Olivier reports that with the simple patch I provided (which changed a to a != and removed

[PATCH 01/33] libceph: refer to osdmap directly in osdmap_show()

2014-03-27 Thread Ilya Dryomov
To make it more readable and save screen space. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/debugfs.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/net/ceph/debugfs.c b/net/ceph/debugfs.c index 258a382e75ed..d225842c7b41

[PATCH 00/33] OSDMAP_ENC, primary_temp, PRIMARY_AFFINITY

2014-03-27 Thread Ilya Dryomov
Hello, This is on top of wip-tunables3, which I posted a week ago and brings the support for the new osdmap encoding (OSDMAP_ENC feature bit), primary_temp and primary affinity (PRIMARY_AFFINITY feature bit) to the kernel client, along with some cleanups. PRIMARY_AFFINITY feature bit is shared

[PATCH 23/33] libceph: enable OSDMAP_ENC feature bit

2014-03-27 Thread Ilya Dryomov
Announce our support for new osdmap enconding. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h |1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h index

[PATCH 08/33] libceph: assert length of osdmap osd arrays

2014-03-27 Thread Ilya Dryomov
Assert length of osd_state, osd_weight and osd_addr arrays. They should all have exactly max_osd elements after the call to osdmap_set_max_osd(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff

[PATCH 16/33] libceph: introduce decode{,_new}_pg_temp() and switch to them

2014-03-27 Thread Ilya Dryomov
Consolidate pg_temp (full map, mappg_t, vectoru32) and new_pg_temp (inc map, same) decoding logic into a common helper and switch to it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 139 ++--- 1 file changed, 67

[PATCH 05/33] libceph: split osdmap allocation and decode steps

2014-03-27 Thread Ilya Dryomov
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |2 +- net/ceph/osd_client.c |2 +- net/ceph/osdmap.c | 44

[PATCH 14/33] libceph: introduce decode{,_new}_pools() and switch to them

2014-03-27 Thread Ilya Dryomov
Consolidate pools (full map, mapu64, pg_pool_t) and new_pools (inc map, same) decoding logic into a common helper and switch to it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 94 - 1 file changed, 57

[PATCH 11/33] libceph: nuke bogus encoding version check in osdmap_apply_incremental()

2014-03-27 Thread Ilya Dryomov
Only version 6 of osdmap encoding is supported, anything other than version 6 results in an error and halts the decoding process. Checking if version is = 5 is therefore bogus. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c |9 - 1 file changed, 4

[PATCH 10/33] libceph: fixup error handling in osdmap_apply_incremental()

2014-03-27 Thread Ilya Dryomov
The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Follow osdmap_decode() and fix this by adding a special e_inval

[PATCH 18/33] libceph: generalize ceph_pg_mapping

2014-03-27 Thread Ilya Dryomov
In preparation for adding support for primary_temp mappings, generalize struct ceph_pg_mapping so it can hold mappings other than pg_temp. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |9 +++-- net/ceph/debugfs.c |4 ++--

[PATCH 09/33] libceph: fix crush_decode() call site in osdmap_decode()

2014-03-27 Thread Ilya Dryomov
The size of the memory area feeded to crush_decode() should be limited not only by osdmap end, but also by the crush map length. Also, drop unnecessary dout() (dout() in crush_decode() conveys the same info) and step past crush map only if it is decoded successfully. Signed-off-by: Ilya Dryomov

[PATCH 07/33] libceph: safely decode max_osd value in osdmap_decode()

2014-03-27 Thread Ilya Dryomov
max_osd value is not covered by any ceph_decode_need(). Use a safe version of ceph_decode_* macro to decode it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ceph/osdmap.c

[PATCH 21/33] libceph: primary_affinity infrastructure

2014-03-27 Thread Ilya Dryomov
Add primary_affinity infrastructure. primary_affinity values are stored in an max_osd-sized array, hanging off ceph_osdmap, similar to a osd_weight array. Introduce {get,set}_primary_affinity() helpers, primarily to return CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to

[PATCH 20/33] libceph: primary_temp decode bits

2014-03-27 Thread Ilya Dryomov
Add a common helper to decode both primary_temp (full map, mappg_t, u32) and new_primary_temp (inc map, same) and switch to it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 69 + 1 file changed, 69

[PATCH 15/33] libceph: switch osdmap_set_max_osd() to krealloc()

2014-03-27 Thread Ilya Dryomov
Use krealloc() instead of rolling our own. (krealloc() with a NULL first argument acts as a kmalloc()). Properly initalize the new array elements. This is needed to make future additions to osdmap easier. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 32

[PATCH 22/33] libceph: primary_affinity decode bits

2014-03-27 Thread Ilya Dryomov
Add two helpers to decode primary_affinity (full map, vectoru32) and new_primary_affinity (inc map, mapu32, u32) and switch to them. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 71 + 1 file changed, 71

[PATCH 04/33] libceph: dump osdmap and enhance output on decode errors

2014-03-27 Thread Ilya Dryomov
Dump osdmap in hex on both full and incremental decode errors, to make it easier to match the contents with error offset. dout() map epoch and max_osd value on success. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 21 +++-- 1 file changed, 15

[PATCH 02/33] libceph: do not prefix osd lines with \t in debugfs output

2014-03-27 Thread Ilya Dryomov
To save screen space in anticipation of more fields (e.g. primary affinity). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/debugfs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ceph/debugfs.c b/net/ceph/debugfs.c index d225842c7b41..112d98edb156

[PATCH 32/33] libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()

2014-03-27 Thread Ilya Dryomov
Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() and get rid of the now unused calc_pg_raw(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 79 +++-- 1 file changed, 4 insertions(+), 75

[PATCH 19/33] libceph: primary_temp infrastructure

2014-03-27 Thread Ilya Dryomov
Add primary_temp mappings infrastructure. struct ceph_pg_mapping is overloaded, primary_temp mappings are stored in an rb-tree, rooted at ceph_osdmap, in a manner similar to pg_temp mappings. Dump primary_temp mappings to /sys/kernel/debug/ceph/client/osdmap, one 'primary_temp pgid osd' per

[PATCH 12/33] libceph: fix and clarify ceph_decode_need() sizes

2014-03-27 Thread Ilya Dryomov
Sum up sizeof(...) results instead of (incorrectly) hard-coding the number of bytes, expressed in ints and longs. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/ceph/osdmap.c

[PATCH 30/33] libceph: add support for primary_temp mappings

2014-03-27 Thread Ilya Dryomov
Change apply_temp() to override primary in the same way pg_temp overrides osd set. primary_temp overrides pg_temp primary too. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git

[PATCH 24/33] libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions

2014-03-27 Thread Ilya Dryomov
Sync up with ceph.git definitions. Bring in ceph_osd_is_down(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h

[PATCH 28/33] libceph: switch ceph_calc_pg_acting() to new helpers

2014-03-27 Thread Ilya Dryomov
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(), raw_to_up_osds() and apply_temps(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |2 +- net/ceph/osdmap.c | 51 --- 2 files changed, 39

[PATCH 03/33] libceph: dump pg_temp mappings to debugfs

2014-03-27 Thread Ilya Dryomov
Dump pg_temp mappings to /sys/kernel/debug/ceph/client/osdmap, one 'pg_temp pgid [osd, ..., osd]' per line, e.g: pg_temp 2.6 [2,3,4] Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/debugfs.c | 11 +++ 1 file changed, 11 insertions(+) diff --git

[PATCH 06/33] libceph: fixup error handling in osdmap_decode()

2014-03-27 Thread Ilya Dryomov
The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Fix this by adding a special e_inval label to be used by all

[PATCH 33/33] libceph: enable PRIMARY_AFFINITY feature bit

2014-03-27 Thread Ilya Dryomov
Announce our support for osdmaps with non-default primary affinity values. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/ceph_features.h

[PATCH 31/33] libceph: add support for osd primary affinity

2014-03-27 Thread Ilya Dryomov
Respond to non-default primary_affinity values accordingly. (Primary affinity allows the admin to shift 'primary responsibility' away from specific osds, effectively shifting around the read side of the workload and whatever overhead is incurred by peering and writes by virtue of being the

[PATCH 17/33] libceph: introduce get_osdmap_client_data_v()

2014-03-27 Thread Ilya Dryomov
Full and incremental osdmaps are structured identically and have identical headers. Add a helper to decode both old (16-bit version, v6) and new (8-bit struct_v+struct_compat+struct_len, v7) osdmap enconding headers and switch to it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com ---

[PATCH 13/33] libceph: rename __decode_pool{,_names}() to decode_pool{,_names}()

2014-03-27 Thread Ilya Dryomov
To be in line with all the other osdmap decode helpers. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index 6dd083906a1e..cd8f34abe7b7 100644

[PATCH 29/33] libceph: return primary from ceph_calc_pg_acting()

2014-03-27 Thread Ilya Dryomov
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com ---

[PATCH 27/33] libceph: introduce apply_temps() helper

2014-03-27 Thread Ilya Dryomov
apply_temp() helper for applying various temporary mappings (at this point only pg_temp mappings) to the up set, therefore transforming it into an acting set. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osdmap.c | 52 1

[PATCH 26/33] libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers

2014-03-27 Thread Ilya Dryomov
pg_to_raw_osds() helper for computing a raw (crush) set, which can contain non-existant and down osds. raw_to_up_osds() helper for pruning non-existant and down osds from the raw set, therefore transforming it into an up set, and determining up primary. Signed-off-by: Ilya Dryomov

[PATCH 25/33] libceph: ceph_can_shift_osds(pool) and pool type defines

2014-03-27 Thread Ilya Dryomov
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type defines. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h | 12 include/linux/ceph/rados.h |5 +++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git

Re: [PATCH 01/33] libceph: refer to osdmap directly in osdmap_show()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: To make it more readable and save screen space. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/debugfs.c | 26 ++ 1 file changed, 14 insertions(+),

Re: [PATCH 02/33] libceph: do not prefix osd lines with \t in debugfs output

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: To save screen space in anticipation of more fields (e.g. primary affinity). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Looks good. If there are lots of these little trivial transformations they could probably be consolidated into a

Re: [PATCH 04/33] libceph: dump osdmap and enhance output on decode errors

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Dump osdmap in hex on both full and incremental decode errors, to make it easier to match the contents with error offset. dout() map epoch and max_osd value on success. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya

Re: [PATCH 05/33] libceph: split osdmap allocation and decode steps

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |2 +-

Re: [PATCH 03/33] libceph: dump pg_temp mappings to debugfs

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Dump pg_temp mappings to /sys/kernel/debug/ceph/client/osdmap, one 'pg_temp pgid [osd, ..., osd]' per line, e.g: pg_temp 2.6 [2,3,4] Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com I didn't look at the broader context, but the new code

Re: [PATCH 06/33] libceph: fixup error handling in osdmap_decode()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Fix this by adding

Re: [PATCH 07/33] libceph: safely decode max_osd value in osdmap_decode()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: max_osd value is not covered by any ceph_decode_need(). Use a safe version of ceph_decode_* macro to decode it. I know it's slightly more efficient, but I never liked those ceph_decode_need() statements that added together a bunch of things you're

Re: [PATCH 08/33] libceph: assert length of osdmap osd arrays

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Assert length of osd_state, osd_weight and osd_addr arrays. They should all have exactly max_osd elements after the call to osdmap_set_max_osd(). Since this function is allowed to fail, could these conditions lead to returning an error code rather

Re: [PATCH 09/33] libceph: fix crush_decode() call site in osdmap_decode()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: The size of the memory area feeded to crush_decode() should be limited not only by osdmap end, but also by the crush map length. Also, drop You're also letting crush_decode() verify it has the buffer space it needs internally, rather than checking it

Re: [PATCH 10/33] libceph: fixup error handling in osdmap_apply_incremental()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Follow

Re: [PATCH 11/33] libceph: nuke bogus encoding version check in osdmap_apply_incremental()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Only version 6 of osdmap encoding is supported, anything other than version 6 results in an error and halts the decoding process. Checking if version is = 5 is therefore bogus. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by:

Re: [PATCH 12/33] libceph: fix and clarify ceph_decode_need() sizes

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: Sum up sizeof(...) results instead of (incorrectly) hard-coding the number of bytes, expressed in ints and longs. Yay!!! Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com ---

Re: [PATCH 13/33] libceph: rename __decode_pool{,_names}() to decode_pool{,_names}()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:17 PM, Ilya Dryomov wrote: To be in line with all the other osdmap decode helpers. I wouldn't object to folding this into another patch, it doesn't change anything functionally. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov

Re: [PATCH 14/33] libceph: introduce decode{,_new}_pools() and switch to them

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Consolidate pools (full map, mapu64, pg_pool_t) and new_pools (inc map, same) decoding logic into a common helper and switch to it. Nice refactoring. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com

Re: [PATCH 15/33] libceph: switch osdmap_set_max_osd() to krealloc()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Use krealloc() instead of rolling our own. (krealloc() with a NULL first argument acts as a kmalloc()). Properly initalize the new array elements. This is needed to make future additions to osdmap easier. Looks good. Reviewed-by: Alex Elder

Re: [PATCH 16/33] libceph: introduce decode{,_new}_pg_temp() and switch to them

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Consolidate pg_temp (full map, mappg_t, vectoru32) and new_pg_temp (inc map, same) decoding logic into a common helper and switch to it. Again, it's nice to see this kind of refactoring being done. Looks good. Reviewed-by: Alex Elder

Re: [PATCH 17/33] libceph: introduce get_osdmap_client_data_v()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Full and incremental osdmaps are structured identically and have identical headers. Add a helper to decode both old (16-bit version, v6) and new (8-bit struct_v+struct_compat+struct_len, v7) osdmap enconding headers and switch to it. It wasn't

Re: [PATCH 19/33] libceph: primary_temp infrastructure

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Add primary_temp mappings infrastructure. struct ceph_pg_mapping is overloaded, primary_temp mappings are stored in an rb-tree, rooted at ceph_osdmap, in a manner similar to pg_temp mappings. Dump primary_temp mappings to

Re: [PATCH 20/33] libceph: primary_temp decode bits

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Add a common helper to decode both primary_temp (full map, mappg_t, u32) and new_primary_temp (inc map, same) and switch to it. The code looks reasonable. I'll have to assume it's doing the decoding properly. Reviewed-by: Alex Elder el...@linaro.org

Re: [PATCH 22/33] libceph: primary_affinity decode bits

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Add two helpers to decode primary_affinity (full map, vectoru32) and new_primary_affinity (inc map, mapu32, u32) and switch to them. One comment below, but otherwise looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov

Re: [PATCH 23/33] libceph: enable OSDMAP_ENC feature bit

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Announce our support for new osdmap enconding. Looks OK to me. Isn't there a version of this OSD map encoding? Maybe there'll be a newer one someday? Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com

Re: [PATCH 24/33] libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Sync up with ceph.git definitions. Bring in ceph_osd_is_down(). Looks good. (Though I didn't verify it matches Ceph's definitions...) Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com ---

Re: [PATCH 25/33] libceph: ceph_can_shift_osds(pool) and pool type defines

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Bring in pg_pool_t::can_shift_osds() counterpart along with pool type defines. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h | 12

Re: [PATCH 26/33] libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: pg_to_raw_osds() helper for computing a raw (crush) set, which can contain non-existant and down osds. raw_to_up_osds() helper for pruning non-existant and down osds from the raw set, therefore transforming it into an up set, and determining up

Re: [PATCH 27/33] libceph: introduce apply_temps() helper

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: apply_temp() helper for applying various temporary mappings (at this point only pg_temp mappings) to the up set, therefore transforming it into an acting set. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov

Re: [PATCH 28/33] libceph: switch ceph_calc_pg_acting() to new helpers

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(), raw_to_up_osds() and apply_temps(). So that's why you have a temp map in each osdmap. The result is pretty clean and you eliminate the local rawosds array. Looks good. Reviewed-by: Alex

Re: [PATCH 29/33] libceph: return primary from ceph_calc_pg_acting()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. And the

Re: [PATCH 30/33] libceph: add support for primary_temp mappings

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Change apply_temp() to override primary in the same way pg_temp overrides osd set. primary_temp overrides pg_temp primary too. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com ---

Re: [PATCH 31/33] libceph: add support for osd primary affinity

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Respond to non-default primary_affinity values accordingly. (Primary affinity allows the admin to shift 'primary responsibility' away from specific osds, effectively shifting around the read side of the workload and whatever overhead is incurred by

Re: [PATCH 33/33] libceph: enable PRIMARY_AFFINITY feature bit

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Announce our support for osdmaps with non-default primary affinity values. Looks good. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h |3 ++- 1 file

Re: [PATCH 32/33] libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()

2014-03-27 Thread Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote: Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() and get rid of the now unused calc_pg_raw(). I'll be honest, my review of this one isn't very solid but it looks OK to me. Reviewed-by: Alex Elder el...@linaro.org Signed-off-by:

Re: [PATCH 1/3 v2] rbd: skip the copyup when an entire object writing

2014-03-27 Thread Josh Durgin
On 03/12/2014 08:21 PM, Guangliang Zhao wrote: It need to copyup the parent's content when layered writing, but an entire object write would overwrite it, so skip it. Signed-off-by: Guangliang Zhao lucienc...@gmail.com --- This looks good to me. This situation is unlikely with normal I/O, but

Re: [PATCH 2/3 v2] rbd: extend the operation type

2014-03-27 Thread Josh Durgin
On 03/12/2014 08:21 PM, Guangliang Zhao wrote: It could only handle the read and write operations now, extend it for the coming discard support. Signed-off-by: Guangliang Zhao lucienc...@gmail.com --- Looks good. Reviewed-by: Josh Durgin josh.dur...@inktank.com drivers/block/rbd.c | 96

Re: [PATCH 3/3 v2] rbd: add discard support for rbd

2014-03-27 Thread Josh Durgin
On 03/12/2014 08:21 PM, Guangliang Zhao wrote: This patch add the discard support for rbd driver. There are there types operation in the driver: 1. The objects would be removed if they completely contained within the discard range. 2. The objects would be truncated if they partly contained

Re: [PATCH 0/5] wip-tunables3

2014-03-27 Thread Josh Durgin
On 03/19/2014 09:09 AM, Ilya Dryomov wrote: Hello, This series updates the kernel implementation of CRUSH with a couple of fixes and a new chooseleaf_vary_r tunable (all ported from ceph.git). TUNABLES3 feature bit is shared with PRIMARY_AFFINITY, which will also be posted in a couple of days