[
https://issues.apache.org/jira/browse/HBASE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871542#action_12871542
]
ryan rawson commented on HBASE-1923:
------------------------------------
an extensive discussion outlined why the sequence id thing is critical and is a
major blocker:
23:43 < dj_ryan> before this can go in
23:44 < dj_ryan> the sequence # stuff is... tricky
23:44 < dj_ryan> so here's the thing
23:44 < dj_ryan> if you use any static sequence id you will end up with problems
23:44 < dj_ryan> if you use System.currentTimeMilli()
23:44 < tlipcon> yea it needs to be unique
23:44 < dj_ryan> it isnt that easy
23:44 < dj_ryan> it has to be smaller than the current largest sequence ID
23:44 < tlipcon> and less than the lowest unflushed
23:44 < tlipcon> right
23:44 < dj_ryan> because otherwise you might bump the sequence id up a bunch
23:45 < dj_ryan> and if we ever hit a log recovery we'd skip a ton of edits
23:45 < tlipcon> yea, perhaps unique negative numbers
23:45 < dj_ryan> well it will go away after a compaction
23:45 < dj_ryan> so if we use '0'
23:45 < dj_ryan> that might workish
23:45 < dj_ryan> you will have to compact between imports
23:45 < tlipcon> or potentially change the storefile holder to be a
List<Storefile> rather than a map
23:45 < tlipcon> we're sort of abusing a map there
23:45 < dj_ryan> there was a really good reason for doing what we did
23:46 < tlipcon> yea i think it used to make more sense
23:46 < tlipcon> when we "early out"ed gets
23:46 < dj_ryan> so until we know exactly why that is so i'd be hesistant at
removing it
23:46 < dj_ryan> so we had that before that
23:46 < tlipcon> now with gets as scans I don't htink it's so important
23:46 < dj_ryan> didnt we stack?
23:46 < dj_ryan> but we still need it
23:46 < dj_ryan> because META has custom logic to do rowAtOrBefore()
23:46 < tlipcon> we need to be able to find oldest/newest, but we're talking
about a min/max over like 10s of elements, hardly slow
23:46 < dj_ryan> which still does a similar behaviour like the old get, only
with its own unique code path
23:48 < dj_ryan> we dont want to go back to the ye olde days of 'you cant drop
a table then recreate it without flushing and major compacting
between the drop and the create'
23:48 < dj_ryan> that was no good
23:48 < dj_ryan> yeah we arent creating sequence ids for these HFiles are we
23:48 < dj_ryan> they wont load
23:49 < dj_ryan> oh isee
23:49 < dj_ryan> it uses currentTimMillis
23:49 < dj_ryan> you cant do that for this code
23:50 < dj_ryan> because we'd end up potentially creating a HFile that has a
seq id > the live seq ID
23:50 < dj_ryan> which would be i trouble
23:50 < tlipcon> yep
23:51 < tlipcon> I think unique negative numbers are probably the best bet
23:51 < tlipcon> though it's a hack
23:51 < dj_ryan> well 0 is probably one of the best choices
23:51 < tlipcon> or better, changing the container to not be a map
23:51 < dj_ryan> actually
23:51 < dj_ryan> we cant do that
23:51 < tlipcon> 0 doesn't let you do multiple incrementals w/o major compact,
like you said
23:51 < dj_ryan> yeah
23:51 < dj_ryan> but we cant get rid of the ordering
23:51 < dj_ryan> i think the compaction code also depends on using that as well
23:52 < St^Ack> we saying that there is no ordering any more?
23:52 < St^Ack> we can't be saying that, right?
23:52 < tlipcon> St^Ack: I dunno
23:52 < St^Ack> no order to sequence files
23:52 < tlipcon> it would be an enticing thing to say
23:52 < tlipcon> but we never quite figured out if we can say it :)
23:52 < tlipcon> i think the delete semantics question actually factors in here
23:52 < St^Ack> you need sequenceid replaying logs
23:53 < St^Ack> and for letting go of hlogs
23:53 < St^Ack> each edit gets a seqid
23:53 < St^Ack> they are ever increasing inside HRS
23:54 < St^Ack> hfile gets written w/ last seqid
23:54 < St^Ack> in old days for get
23:54 < St^Ack> we'd order them by seqid
23:54 < tlipcon> yea, I don't think we're debating whether we need seqid
23:54 < tlipcon> just whether seqid has to be entirely unique
23:54 < St^Ack> it does have to be unique
23:55 < St^Ack> at least currenlty it does
23:55 < St^Ack> when we open them, we key them by seqid
23:55 < tlipcon> yea currently cuz of the Map<Long, StoreFile> or whatever we do
23:55 < tlipcon> but we could change that to a List<Pair<Long, StoreFile>>
23:56 < St^Ack> compacting code currently addresses them in order
23:56 < St^Ack> which version to let go
23:56 < St^Ack> we let go of the oldest
23:56 < St^Ack> first
23:57 < jgray> St^Ack: compacting code has notion of storefile ordering?
23:57 < St^Ack> todd, you are inserting into quiescent table?
23:57 < St^Ack> jgray: yes
23:58 < tlipcon> St^Ack: nope, inserting a new storefile into a live table
23:58 < St^Ack> tlipcon: if you asked RS for a seqid
23:58 < tlipcon> that's the fun of this patch
23:58 < St^Ack> that might help
23:58 < tlipcon> it all happens inside the RS
23:58 < St^Ack> hmmm....
23:58 < St^Ack> that won't help
23:58 < tlipcon> we don't want a new seqid though
23:59 < tlipcon> we explicitly want an old seqid
23:59 < St^Ack> because then you could have memsotre flush that had edits
either side of your new file
23:59 < jgray> St^Ack: where does it look at storefile order?
23:59 < St^Ack> jgray: files are ordered in the Store
23:59 < St^Ack> by seqid
23:59 < St^Ack> newest to oldest
00:00 < jgray> yeah, i thought u meant when it does the merge?
00:00 < jgray> u mean in picking which files to compact?
00:00 < St^Ack> yeah.... keesp order
00:00 < jgray> yeah ok
00:00 < jgray> sorry thought u meant something else
00:00 < St^Ack> but you saying that because it uses mergesort it dont' matter
00:00 < St^Ack> tlipcon: thats hard.
00:00 < jgray> once it picks the files to merge, from then on it doesn't look
at seqid
00:01 < jgray> but we do look at them in seqid order when picking files to
compact
00:01 < dj_ryan> during the process of compaction yes we dont use the sequence
ordering
00:01 < St^Ack> so, if 10 versions and we're to keep 3, we could be keping any
3?
00:01 < tlipcon> yea, but I think we all agree current compaction heuristic is
kind of silly
00:01 < dj_ryan> well
00:02 < dj_ryan> so the thing is picking which store files uses the sequence
00:02 < dj_ryan> because when we do minor compaction its important that we
compact adjacent files
00:02 < dj_ryan> and it will be important for future patches
00:02 < jgray> St^Ack: 10 columns w/ same timestamp?
00:02 < jgray> the same 10 cols
00:02 < St^Ack> thats diff topic (smile)
00:02 < dj_ryan> heh
00:03 < dj_ryan> i mean how often do we want to actually do do loadFile
00:03 < tlipcon> dj_ryan: the particular customer that has requested this
essentially wants to *only* do loadfile
00:03 < dj_ryan> well
00:03 < dj_ryan> you could pick the current sequence id then increment it
00:03 < jgray> St^Ack: if the columns have unique versions/stamps, then we do
it right... we merge them in KV order... the issue there is more
about when they have the same stamp, we would want to know
storefile stamps to use that to determine order
00:03 < dj_ryan> thus creating a 1 edit hole in the hlog
00:04 < dj_ryan> in fact you could even log it
00:04 < tlipcon> dj_ryan: isn't it the opposite? we want to pick the latest
flushed store file, and go one less?
00:04 < dj_ryan> but then the metadata in the HFile wouldnt match
00:04 < dj_ryan> unless we re-wrote the HFile during load (Doh)
00:04 < St^Ack> jgray: makes sense
00:04 < tlipcon> yea that's another pain in the ass
00:04 < dj_ryan> hmm
00:04 < dj_ryan> yeah i guess you're right
00:05 < dj_ryan> again the problem would be the sequence id in the HFile
00:05 < dj_ryan> right now its being baked in at the map reduce time
00:05 < dj_ryan> no matter what we pick at regionserver time
00:05 < tlipcon> right, I think our best bet is to either not put one in there,
or put in a special BULKLOAD_HFILE signifier of some sort
00:05 < tlipcon> either in place of or in addition to the seqid
00:05 < dj_ryan> you'd potentially have an issue during the next region re-open
00:09 < dj_ryan> 0 might be the best choice at MR time
00:09 < tlipcon> I think adding a new metadata entry saying it was a bulk load
is our best bet
00:09 < tlipcon> then treating them specially where we need to
00:09 < tlipcon> i hate special casing, but other things seem hackish
00:09 < St^Ack> can't be 0 because next time he runs nd if its still around...
clash
00:09 < dj_ryan> St^Ack: right
00:10 < dj_ryan> can we call seqid 0 special
00:10 < tlipcon> I'd rather put no seqid at all
00:10 < dj_ryan> we will never have a seqid 0 in the wild
00:10 < tlipcon> calling seqid 0 special is easy to miss
00:10 < dj_ryan> hmm
00:10 < tlipcon> whereas if we make it null, we'll get NPEs if we forget to
account for this case
00:10 < tlipcon> which I think is preferable
00:10 < tlipcon> otherwise people will forget about it and code stuff that
breaks subtly instead of loudly
00:10 < dj_ryan> hmm
00:11 < dj_ryan> without a sequence id you cant compact those files to each
other
00:11 < dj_ryan> or maybe they can
00:11 < dj_ryan> so if you grab all the hfiles w/sequence ids
00:11 < St^Ack> w/o seqid can't load it
00:11 < dj_ryan> then fit in the ones without it around it
00:13 < dj_ryan> so when you compact N files together you pick the largest
sequence id
00:13 < dj_ryan> then use that as the new sequence id of the file
00:13 < dj_ryan> (of the output file)
00:14 < dj_ryan> and that comes from the file's metadata i think
> Bulk incremental load into an existing table
> --------------------------------------------
>
> Key: HBASE-1923
> URL: https://issues.apache.org/jira/browse/HBASE-1923
> Project: HBase
> Issue Type: New Feature
> Components: client, mapred, regionserver, scripts
> Affects Versions: 0.21.0
> Reporter: anty.rao
> Assignee: Todd Lipcon
> Attachments: hbase-1923-prelim.txt
>
>
> hbase-48 is about bulk load of a new table,maybe it's more practicable to
> bulk load aganist a existing table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.