[
https://issues.apache.org/jira/browse/HBASE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710127#action_12710127
]
Jonathan Gray commented on HBASE-1304:
--------------------------------------
Dropped some thoughts on IRC, figured I'd post here:
[10:42am] jgray2: dj_ryan: i don't think v7 patch contains changes to
compactions yet... not following your questions exactly but compactions need to
be merged with scan code
[10:43am] jgray2: gets can be redone as scans
[10:43am] jgray2: and that's probably the direction we'll need to go
[10:43am] jgray2: if millions of columns in a single row
[10:44am] jgray2: you basically need to scan them, even within the row
[10:44am] jgray2: QueryMatcher makes the decision about what to do with a KV
given the parameters of the query
[10:45am] jgray2: the two complex bits of it are a DeleteTracker and the
ColumnTracker
[10:45am] jgray2: two implementations of each
[10:46am] jgray2: ScanDT and GetDT are different because, right now, a Get is
not a low-level KV merge like a Scan is
[10:46am] jgray2: so when you're scanning (or compacting) you actually look at
a Stores keys in strict sorted order
[10:46am] jgray2: merging all storefiles + memcache
[10:46am] jgray2: so when tracking deletes
[10:46am] jgray2: you need to track very little
[10:47am] jgray2: in a Get, you grab all keys from each storefile, starting at
memcache, then going through them newest to oldest
[10:47am] jgray2: so deletes you read in one storefile will apply to any
storefiles that are older
[10:47am] jgray2: so GetDT is quite a bit more complex
[10:47am] jgray2: we need to benchmark and see if scans are gooder
[10:47am] jgray2: because they are much more "correct"
[10:47am] jgray2: if you do manual timestamp setting, gets can give you
indeterminate results
[10:48am] jgray2: but scans are always strictly sorted
[10:48am] jgray2: ColumnTracker is implemented as either ExplicitCT or
WildcardCT
[10:48am] jgray2: explicit is when qualifiers are given, wildcard if all in a
family
[10:48am] jgray2: so it tracks that, and then max versions for each
[10:49am] jgray2: honestly i've not looked at compactions since i wrote
scanners but have had it in mind
[10:50am] jgray2: it will use QueryMatcher and CT/DT directly
[10:50am] jgray2: wildcardCT where maxVerisons = family setting
[10:50am] jgray2: ScanDT
[10:50am] jgray2: QueryMatcher already does TTL enforcement and such
[10:51am] jgray2: the only difference is in a minor compaction you still need
to output deletes
[10:51am] jgray2: that are not fully enforced or overridden
[10:51am] jgray2: so then we'll probably have a CompactDT
[10:52am] jgray2: might need a slight modification here and there, i don't
think QM is written to ever permit deletes out to the result
> New client server implementation of how gets and puts are handled.
> -------------------------------------------------------------------
>
> Key: HBASE-1304
> URL: https://issues.apache.org/jira/browse/HBASE-1304
> Project: Hadoop HBase
> Issue Type: Improvement
> Affects Versions: 0.20.0
> Reporter: Erik Holstad
> Assignee: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: hbase-1304-v1.patch, HBASE-1304-v2.patch,
> HBASE-1304-v3.patch, HBASE-1304-v4.patch, HBASE-1304-v5.patch,
> HBASE-1304-v6.patch, HBASE-1304-v7.patch
>
>
> Creating an issue where the implementation of the new client and server will
> go. Leaving HBASE-1249 as a discussion forum and will put code and patches
> here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.