> I didn't follow your numeric examples above---I missed how you mapped Offsets 
> to Object Numbers---but I follow you on striping meaning different data 
> locations for what Hadoop would think would be one Ceph object in one place.
I didn't explicitly describe the mapping algorithm, but it can be found in the 
function "ceph_calc_file_object_mapping(...)" in the kernel client. If you 
execute the algorithm with the parameters in my example you can reproduce the 
mapping I presented.

>> 
>> The more natural (and general) solution is to consider the stripe unit to be 
>> the _unit_ of Hadoop blocks, not entire objects. When stripe unit and block 
>> size are the same the result is analogous to HDFS's treatment of blocks.
> I agree with you, and push forward one more step:  Ceph and Hadoop should 
> just think of a block/object as the same size.
Per Sage's response, Hadoop block can be equal to Ceph stripe unit.


> One of the TODO's is exposing Ceph's object size to Hadoop, and that "read" 
> interface for block size will probably need to expand to a "write" interface 
> to reduce confusion with folks configuring Hadoop to use a block size of N 
> bytes.
How is configured block size relevant in Hadoop? This seems to me to be 
specific to HDFS. The analogy would be to configure the file layout parameters 
in Ceph.

> After writing this code, I do like seeing the words "scrap" and "JNI" so 
> close in the same sentence.  That's more up to the Hadoop community, though; 
> I don't know how well-accepted JNA is in their code base.
One solution might be a lazily populated sysfs interface for retrieving object 
information for a given file, circumventing the problem Java has calling 
IOCTLs. But that's another conversation.

> The ioctl struct ceph_ioctl_dataloc already returns the primary copy's object 
> offset for an input file offset, though I think it would be a little more 
> useful if it included replica offsets.
I can submit a patch for this. Sage, I remember you mentioning that reading 
from replicas might pose (scalability?) problems. Any thoughts on this?--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to