Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit :
Olivier reports that with the simple patch I provided
(which changed a to a != and removed an assertion)
he is running successfully.
To me this is fantastic news, and you can see I posted
a patch with the fix.
There remains a
On Thu, Mar 27, 2014 at 9:48 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit :
Olivier reports that with the simple patch I provided
(which changed a to a != and removed an assertion)
he is running successfully.
To me this is
Le jeudi 27 mars 2014 à 10:45 +0200, Ilya Dryomov a écrit :
On Thu, Mar 27, 2014 at 9:48 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
Le mercredi 26 mars 2014 à 15:58 -0500, Alex Elder a écrit :
Olivier reports that with the simple patch I provided
(which changed a to a != and removed
To make it more readable and save screen space.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/debugfs.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/ceph/debugfs.c b/net/ceph/debugfs.c
index 258a382e75ed..d225842c7b41
Hello,
This is on top of wip-tunables3, which I posted a week ago and brings
the support for the new osdmap encoding (OSDMAP_ENC feature bit),
primary_temp and primary affinity (PRIMARY_AFFINITY feature bit) to the
kernel client, along with some cleanups. PRIMARY_AFFINITY feature bit
is shared
Announce our support for new osdmap enconding.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/ceph_features.h |1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/ceph/ceph_features.h
b/include/linux/ceph/ceph_features.h
index
Assert length of osd_state, osd_weight and osd_addr arrays. They
should all have exactly max_osd elements after the call to
osdmap_set_max_osd().
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c |8
1 file changed, 4 insertions(+), 4 deletions(-)
diff
Consolidate pg_temp (full map, mappg_t, vectoru32) and new_pg_temp
(inc map, same) decoding logic into a common helper and switch to it.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 139 ++---
1 file changed, 67
Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h |2 +-
net/ceph/osd_client.c |2 +-
net/ceph/osdmap.c | 44
Consolidate pools (full map, mapu64, pg_pool_t) and new_pools (inc
map, same) decoding logic into a common helper and switch to it.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 94 -
1 file changed, 57
Only version 6 of osdmap encoding is supported, anything other than
version 6 results in an error and halts the decoding process. Checking
if version is = 5 is therefore bogus.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c |9 -
1 file changed, 4
The existing error handling scheme requires resetting err to -EINVAL
prior to calling any ceph_decode_* macro. This is ugly and fragile,
and there already are a few places where we would return 0 on error,
due to a missing reset. Follow osdmap_decode() and fix this by adding
a special e_inval
In preparation for adding support for primary_temp mappings, generalize
struct ceph_pg_mapping so it can hold mappings other than pg_temp.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h |9 +++--
net/ceph/debugfs.c |4 ++--
The size of the memory area feeded to crush_decode() should be limited
not only by osdmap end, but also by the crush map length. Also, drop
unnecessary dout() (dout() in crush_decode() conveys the same info) and
step past crush map only if it is decoded successfully.
Signed-off-by: Ilya Dryomov
max_osd value is not covered by any ceph_decode_need(). Use a safe
version of ceph_decode_* macro to decode it.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c |6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ceph/osdmap.c
Add primary_affinity infrastructure. primary_affinity values are
stored in an max_osd-sized array, hanging off ceph_osdmap, similar to
a osd_weight array.
Introduce {get,set}_primary_affinity() helpers, primarily to return
CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to
Add a common helper to decode both primary_temp (full map, mappg_t,
u32) and new_primary_temp (inc map, same) and switch to it.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 69 +
1 file changed, 69
Use krealloc() instead of rolling our own. (krealloc() with a NULL
first argument acts as a kmalloc()). Properly initalize the new array
elements. This is needed to make future additions to osdmap easier.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 32
Add two helpers to decode primary_affinity (full map, vectoru32) and
new_primary_affinity (inc map, mapu32, u32) and switch to them.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 71 +
1 file changed, 71
Dump osdmap in hex on both full and incremental decode errors, to make
it easier to match the contents with error offset. dout() map epoch
and max_osd value on success.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 21 +++--
1 file changed, 15
To save screen space in anticipation of more fields (e.g. primary
affinity).
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/debugfs.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ceph/debugfs.c b/net/ceph/debugfs.c
index d225842c7b41..112d98edb156
Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()
and get rid of the now unused calc_pg_raw().
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 79 +++--
1 file changed, 4 insertions(+), 75
Add primary_temp mappings infrastructure. struct ceph_pg_mapping is
overloaded, primary_temp mappings are stored in an rb-tree, rooted at
ceph_osdmap, in a manner similar to pg_temp mappings.
Dump primary_temp mappings to /sys/kernel/debug/ceph/client/osdmap,
one 'primary_temp pgid osd' per
Sum up sizeof(...) results instead of (incorrectly) hard-coding the
number of bytes, expressed in ints and longs.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/ceph/osdmap.c
Change apply_temp() to override primary in the same way pg_temp
overrides osd set. primary_temp overrides pg_temp primary too.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c |7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git
Sync up with ceph.git definitions. Bring in ceph_osd_is_down().
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(),
raw_to_up_osds() and apply_temps().
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h |2 +-
net/ceph/osdmap.c | 51 ---
2 files changed, 39
Dump pg_temp mappings to /sys/kernel/debug/ceph/client/osdmap,
one 'pg_temp pgid [osd, ..., osd]' per line, e.g:
pg_temp 2.6 [2,3,4]
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/debugfs.c | 11 +++
1 file changed, 11 insertions(+)
diff --git
The existing error handling scheme requires resetting err to -EINVAL
prior to calling any ceph_decode_* macro. This is ugly and fragile,
and there already are a few places where we would return 0 on error,
due to a missing reset. Fix this by adding a special e_inval label to
be used by all
Announce our support for osdmaps with non-default primary affinity
values.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/ceph_features.h |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/ceph/ceph_features.h
Respond to non-default primary_affinity values accordingly. (Primary
affinity allows the admin to shift 'primary responsibility' away from
specific osds, effectively shifting around the read side of the
workload and whatever overhead is incurred by peering and writes by
virtue of being the
Full and incremental osdmaps are structured identically and have
identical headers. Add a helper to decode both old (16-bit version,
v6) and new (8-bit struct_v+struct_compat+struct_len, v7) osdmap
enconding headers and switch to it.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
To be in line with all the other osdmap decode helpers.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 6dd083906a1e..cd8f34abe7b7 100644
In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly. Primary is now specified separately
from the order of osds in the set.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
apply_temp() helper for applying various temporary mappings (at this
point only pg_temp mappings) to the up set, therefore transforming it
into an acting set.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/osdmap.c | 52
1
pg_to_raw_osds() helper for computing a raw (crush) set, which can
contain non-existant and down osds.
raw_to_up_osds() helper for pruning non-existant and down osds from the
raw set, therefore transforming it into an up set, and determining up
primary.
Signed-off-by: Ilya Dryomov
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type
defines.
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h | 12
include/linux/ceph/rados.h |5 +++--
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
To make it more readable and save screen space.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
net/ceph/debugfs.c | 26 ++
1 file changed, 14 insertions(+),
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
To save screen space in anticipation of more fields (e.g. primary
affinity).
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
Looks good.
If there are lots of these little trivial transformations
they could probably be consolidated into a
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Dump osdmap in hex on both full and incremental decode errors, to make
it easier to match the contents with error offset. dout() map epoch
and max_osd value on success.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h |2 +-
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Dump pg_temp mappings to /sys/kernel/debug/ceph/client/osdmap,
one 'pg_temp pgid [osd, ..., osd]' per line, e.g:
pg_temp 2.6 [2,3,4]
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
I didn't look at the broader context, but the new code
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
The existing error handling scheme requires resetting err to -EINVAL
prior to calling any ceph_decode_* macro. This is ugly and fragile,
and there already are a few places where we would return 0 on error,
due to a missing reset. Fix this by adding
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
max_osd value is not covered by any ceph_decode_need(). Use a safe
version of ceph_decode_* macro to decode it.
I know it's slightly more efficient, but I never liked those
ceph_decode_need() statements that added together a bunch
of things you're
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Assert length of osd_state, osd_weight and osd_addr arrays. They
should all have exactly max_osd elements after the call to
osdmap_set_max_osd().
Since this function is allowed to fail, could these
conditions lead to returning an error code rather
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
The size of the memory area feeded to crush_decode() should be limited
not only by osdmap end, but also by the crush map length. Also, drop
You're also letting crush_decode() verify it has the buffer space
it needs internally, rather than checking it
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
The existing error handling scheme requires resetting err to -EINVAL
prior to calling any ceph_decode_* macro. This is ugly and fragile,
and there already are a few places where we would return 0 on error,
due to a missing reset. Follow
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Only version 6 of osdmap encoding is supported, anything other than
version 6 results in an error and halts the decoding process. Checking
if version is = 5 is therefore bogus.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by:
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
Sum up sizeof(...) results instead of (incorrectly) hard-coding the
number of bytes, expressed in ints and longs.
Yay!!!
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
On 03/27/2014 01:17 PM, Ilya Dryomov wrote:
To be in line with all the other osdmap decode helpers.
I wouldn't object to folding this into another patch, it
doesn't change anything functionally.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Consolidate pools (full map, mapu64, pg_pool_t) and new_pools (inc
map, same) decoding logic into a common helper and switch to it.
Nice refactoring.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Use krealloc() instead of rolling our own. (krealloc() with a NULL
first argument acts as a kmalloc()). Properly initalize the new array
elements. This is needed to make future additions to osdmap easier.
Looks good.
Reviewed-by: Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Consolidate pg_temp (full map, mappg_t, vectoru32) and new_pg_temp
(inc map, same) decoding logic into a common helper and switch to it.
Again, it's nice to see this kind of refactoring being done.
Looks good.
Reviewed-by: Alex Elder
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Full and incremental osdmaps are structured identically and have
identical headers. Add a helper to decode both old (16-bit version,
v6) and new (8-bit struct_v+struct_compat+struct_len, v7) osdmap
enconding headers and switch to it.
It wasn't
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Add primary_temp mappings infrastructure. struct ceph_pg_mapping is
overloaded, primary_temp mappings are stored in an rb-tree, rooted at
ceph_osdmap, in a manner similar to pg_temp mappings.
Dump primary_temp mappings to
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Add a common helper to decode both primary_temp (full map, mappg_t,
u32) and new_primary_temp (inc map, same) and switch to it.
The code looks reasonable. I'll have to assume
it's doing the decoding properly.
Reviewed-by: Alex Elder el...@linaro.org
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Add two helpers to decode primary_affinity (full map, vectoru32) and
new_primary_affinity (inc map, mapu32, u32) and switch to them.
One comment below, but otherwise looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Announce our support for new osdmap enconding.
Looks OK to me. Isn't there a version of this OSD
map encoding? Maybe there'll be a newer one someday?
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Sync up with ceph.git definitions. Bring in ceph_osd_is_down().
Looks good. (Though I didn't verify it matches Ceph's definitions...)
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type
defines.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/osdmap.h | 12
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
pg_to_raw_osds() helper for computing a raw (crush) set, which can
contain non-existant and down osds.
raw_to_up_osds() helper for pruning non-existant and down osds from the
raw set, therefore transforming it into an up set, and determining up
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
apply_temp() helper for applying various temporary mappings (at this
point only pg_temp mappings) to the up set, therefore transforming it
into an acting set.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(),
raw_to_up_osds() and apply_temps().
So that's why you have a temp map in each osdmap.
The result is pretty clean and you eliminate the
local rawosds array.
Looks good.
Reviewed-by: Alex
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly. Primary is now specified separately
from the order of osds in the set.
And the
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Change apply_temp() to override primary in the same way pg_temp
overrides osd set. primary_temp overrides pg_temp primary too.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Respond to non-default primary_affinity values accordingly. (Primary
affinity allows the admin to shift 'primary responsibility' away from
specific osds, effectively shifting around the read side of the
workload and whatever overhead is incurred by
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Announce our support for osdmaps with non-default primary affinity
values.
Looks good.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
include/linux/ceph/ceph_features.h |3 ++-
1 file
On 03/27/2014 01:18 PM, Ilya Dryomov wrote:
Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()
and get rid of the now unused calc_pg_raw().
I'll be honest, my review of this one isn't very
solid but it looks OK to me.
Reviewed-by: Alex Elder el...@linaro.org
Signed-off-by:
On 03/12/2014 08:21 PM, Guangliang Zhao wrote:
It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.
Signed-off-by: Guangliang Zhao lucienc...@gmail.com
---
This looks good to me. This situation is unlikely with normal I/O, but
On 03/12/2014 08:21 PM, Guangliang Zhao wrote:
It could only handle the read and write operations now,
extend it for the coming discard support.
Signed-off-by: Guangliang Zhao lucienc...@gmail.com
---
Looks good.
Reviewed-by: Josh Durgin josh.dur...@inktank.com
drivers/block/rbd.c | 96
On 03/12/2014 08:21 PM, Guangliang Zhao wrote:
This patch add the discard support for rbd driver.
There are there types operation in the driver:
1. The objects would be removed if they completely contained
within the discard range.
2. The objects would be truncated if they partly contained
On 03/19/2014 09:09 AM, Ilya Dryomov wrote:
Hello,
This series updates the kernel implementation of CRUSH with a couple of
fixes and a new chooseleaf_vary_r tunable (all ported from ceph.git).
TUNABLES3 feature bit is shared with PRIMARY_AFFINITY, which will also
be posted in a couple of days
72 matches
Mail list logo