Hi Loic, Yes, db is designed to optimize the workloads on flash backends and uses only standard interfaces and system calls to achieve that.
Varada -----Original Message----- From: Loic Dachary [mailto:[email protected]] Sent: Tuesday, February 24, 2015 9:57 PM To: Somnath Roy; Varada Kari; Ceph Development Subject: Re: Adding a proprietary key value store to CEPH Hi, On 24/02/2015 17:13, Somnath Roy wrote:> Hi Loic, > This is an effort to make ceph interface pluggable to any proprietary k/v db > available. The integrator has to implement a shim layer (dynamically > loadable) by implementing these interfaces. That shim layer can do specific > job for the k/v db of theirs. > Now, regarding our k/v db, yes, it is written keeping in mind that backend > will be flash not HDD. This is the major difference between leveldb/rocksdb > etc. Our db reduces the flash WA dramatically and the performance also should > be similar or better than rocksdb. > Also, I think there should more of this proprietary dbs that people want to > integrate with Ceph as I don't think leveldb/rocksdb will not be able to > serve all kind of workload. Thanks for sharing these details :-) Would this db be specific to a line of product, for instance by making ioctl calls that only a specific driver for a specific hardware would understand ? Or is this a db that is designed to optimize workloads for flash drives using only standard and documented API or system calls ? > Thanks & Regards > Somnath > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Loic Dachary > Sent: Tuesday, February 24, 2015 6:30 AM > To: Varada Kari; Ceph Development > Subject: Re: Adding a proprietary key value store to CEPH > > Hi, > > I'm curious about the reasons why the key/value store you mention is not > published as Free Software. Is it because it implements a proprietary > interface to a specific hardware ? Because it has additional functionalities > comparied to rocksdb etc. ? Because it performs better under some workloads ? > > Cheers > > On 24/02/2015 14:20, Varada Kari wrote: >> Hi Sage, >> >> We are trying to integrate a new proprietary key value store to CEPH. To >> integrate this KV-store, which is a closed source shared library, we propose >> a new class to CEPH called PropDBStore which does a dlopen and imports the >> required symbols. This framework will help in integrating vendor specific >> extensions to CEPH. >> >> The gist of the implementation is as follows. >> >> 1. Implement a wrapper around the proprietary KVStore. Let us call it as >> KVExtension. This is a shared library which implements all interfaces >> required by CEPH KeyValueStore. >> 2. A new class is derived from KeyValueDB called PropDBStore, which honors >> the semantics of KeyvalueStore and KeyValueDB. This class acts as mediator >> between CEPH and KVExtension. This class transforms bufferlist etc... to >> const char pointers or strings for the extension to understand. >> 3. PropDBStore, loads (dlopen) the KVExtension during OSD initialization. >> Path to the KVExtension can be mentioned in ceph.conf. >> 4. Interfaces that needs to be implemented in KVExtension, which are >> imported by the PropDBStore are added in a new header called >> PropDBWrapper.h. This header contains the signatures for the necessary >> interfaces like init(), close(), submit_transaction(), get() and >> get_iterator(). Similarly for Iterator functionality, PropDBIterator.h, >> which specifies the signatures of seek_to_first (), seek_to_last(), >> lower_bound() and upper_bound() etc... PropDBStore includes these headers >> to import the symbols, using dlsym(). >> 5. Choosing the proprietary DB as Backend to the OSD is controlled/managed >> by config options of the ceph (/etc/ceph/ceph.conf) like rocksdb or leveldb. >> 6. Rest of the existing functionality is not disturbed by this change. >> Changing the osd backend option will change backend implementation. But this >> change is not dynamic. The type of the backend should be chosen at osd >> creation time and osd will continue use that backend till that osd is >> reformatted again. >> 7. The new KVStore we are trying to integrate works on a raw partition, so >> we divided the osd drive into two partitions. One partition is given to osd >> Meta data (super block, fsid etc...), and the other is given to the new db >> to manage it. OSD partition is now not the entire disk, but 2-4GB which >> needed for the metadata. >> >> Please share your thoughts around this. >> Thanks, >> Varada >> >> >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is >> intended only for the use of the designated recipient(s) named above. If the >> reader of this message is not the intended recipient, you are hereby >> notified that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly >> prohibited. If you have received this communication in error, please notify >> the sender by telephone or e-mail (as shown above) immediately and destroy >> any and all copies of this message in your possession (whether hard copies >> or electronically stored copies). >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to [email protected] More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
