On 21 September 2016 at 17:28, Haomai Wang <hao...@xsky.com> wrote:
> BTW, why you need to iterate so much objects..... I think it should be
> done by other ways to achieve the goal.

Mostly it's just a brute force way to identify objects that shouldn't
exist, or objects that have been orphaned (e.g: last modification time
was over 60 days ago).  This house keeping probably wouldn't be needed
if it was possible to rely on the storage platform and the index
holding reference to all objects stored being always correct.

In reality, strange things happen - data was never written, or goes
missing during or after a migration, or disk failure, etc...

When the lifecycle of an object ends, it gets removed from index, and
a deleted from disk.  Again, reality - data was never deleted, or gets
recreated during a migration, etc... :-)

This goes back to iterating all objects and validating that there's
nothing unexpected still on disk.

Now that I have (mostly) one region migrated over to Ceph, maybe there
will start being less reliance on this sort of house keeping.  But the
constant stating of objects for its existence must always happen
during periodical refreshes.

But from what I gather from my local tests, and feedback on here, it's
seems like there should be room for ample improvement on object
iteration.  If I request the an object, via
rados_nobjects_list_next(), the chances of me asking for the next
object via the same callback should be pretty high, right?  And it
would do no harm prefetching that data before it's requested by the
rados client.

Iain Buclaw
ceph-users mailing list

Reply via email to