I was thinking of something like the following. A locality group in
an rfile would be comprised of arbitrary locality group metadata and
key value pairs.
interface Partitioner {
void init(LocalityGroupConfig lgc);
//method for determining what locality groups a compaction should
create in the output RFile
//this method recieves metadata about locality groups in the files
being compacted
List<LocalityGroupInfo>
getLocalityGroupsToCreate(List<LocalityGroupMetadata> lgml);
//the following three methods are used to write data into a new
RFile locality group
void startLocalityGroup(LocalityGroupInfo lgi);
//all data is passed throug this method it serves two purposes
decide if data even goes in a locality group
//and for the data that is accepted build up the metadata for the
locality group being created
boolean acceptKeyValue(Key k, Value v);
//once all data is written ask for the metadata and write that to the RFile
LocalityGroupMetadata finishLocalityGroup();
//method to select which locality groups in a RFile should be read
by a scan or compaction
//this method is passed info about the existing locality groups in an RFile
List<LocalityGroupInfo>
getLocalityGroupsToRead(List<LocalityGroupMetadata> lgml, ScanOptions
so);
}
Keith
On Wed, Mar 7, 2012 at 7:39 PM, Eric Newton <[email protected]> wrote:
> Something like this:
>
> partition, meta = partitioner.choose(key, value, meta)
>
> The partition can be a string, which is used to look up the partitions'
> configuration. The meta information can be used by queries to avoid
> including files from the partition in queries. The metadata would be saved
> at the close of the file.
>
> During a query, files could be filtered based on some arbitrary query data:
>
> files = partitioner.selectFiles(files, query)
>
> I like it! It might also be nice to indicate some sort of "estimated"
> percent of keys processed, and the type of compaction occurring (flush,
> partial, everything):
>
> partition, meta = partitioner.choose(key, value, meta, percent,
> compactionType)
>
> Is there any other tablet-level information we might want to provide to a
> partitioner? Perhaps the source partition of the key/value?
>
> -Eric
>
> On Wed, Mar 7, 2012 at 6:54 PM, Keith Turner <[email protected]> wrote:
>
>> Replying to myself :)
>>
>> The more I think about this, it seems that locality groups could
>> handled by plugins that can parition the data and select locality
>> groups in any way it likes. Want locality groups based on row suffix,
>> go ahead and write the plugin.
>>
>> The plugin would be used for compaction time partitioning and scan
>> time locality group selection. User could pass options to the
>> locality group plugin at scan time just like options are passed to
>> iterators. Maybe this is an extension or further generalization of
>> the existing iterator framework, I have not thought through that far
>> enough.
>>
>> Keith
>>
>> On Wed, Mar 7, 2012 at 6:22 PM, Keith Turner <[email protected]> wrote:
>> > We regularly have questions from users about querying new data and
>> > aging off old data. I was thinking about how we could better support
>> > this in need in 1.5. One thing that occurred to me is having locality
>> > groups that were based on timestamp instead of column family. For
>> > example a locality group for each month. Alternatively we could have
>> > group for < day old, < week old, < month old, < year old. Would need
>> > a way for users to define these.
>> >
>> > This would make scanning a table for recent data much faster. Also
>> > dropping old data could be made much faster by just dropping entire
>> > locality groups at compaction time.
>> >
>> > One thing that irks me about this is : Should column family and time
>> > based locality groups be mutually exclusive (i.e. an RFile has one or
>> > the other, not both)? If they are not then order of which is
>> > partitioned first is important for query performance and would
>> > probably need to be user configurable.
>> >
>> > Thoughts?
>> >
>> > Keith
>>