Thanks Jingcheng

Yes, it just references the source MOB data until MOB compaction.

Based on that, I think this really is a critical bug.  It allowed the MOBs
to be deleted before that happened, and thus broken references and data
loss.  Or am I misunderstanding you please?



On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng <jingcheng...@intel.com>
wrote:

> Hi Tim,
>
> > was this running a background task to copy the MOB data when the
> snapshot was cloned and I just deleted the source before the copy was
> complete?
> The MOB data can be copied when mob compaction happens. But the MOB files
> should not be deleted even if they are not copied and after the source
> table is deleted. The archive cleaner should keep them until all the
> references are gone. Let me check the code again.
>
> > when running "snapshot and clone" it just references the source MOB data
> until a (?) change?
> Yes, it just references the source MOB data until MOB compaction.
>
> > snapshot and clone just doesn't support MOB?
> It supports.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson...@gmail.com]
> Sent: Thursday, October 13, 2016 1:56 AM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks - well it is now on the CDH community forum too.
>
> Jonathan Hsieh pretty much described what I see in his comment on
> HBASE-12332
> https://issues.apache.org/jira/browse/HBASE-12332?
> focusedCommentId=14241478&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
>
>
>
> On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <h...@cloudera.com> wrote:
>
> > Hi Tim,,
> >
> > Just read more details, it may not be related with the issue we fixed
> > (mob compaction related).
> > I am doing a similar test to see if I can reproduce it.
> >
> > Thanks,
> > Huaxiang
> > > On Oct 12, 2016, at 10:29 AM, Tim Robertson
> > > <timrobertson...@gmail.com>
> > wrote:
> > >
> > > Thanks Ted, Huaxiang
> > >
> > > I'll move this to a Cloudera forum and comment back here if it
> > > appears unrelated.
> > >
> > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <h...@cloudera.com
> > <mailto:h...@cloudera.com>> wrote:
> > >
> > >> By the way, I forgot the forum link: http://community.cloudera.com
> > >> <
> > http://community.cloudera.com/> <
> > >> http://community.cloudera.com/ <http://community.cloudera.com/>>
> > >>
> > >> Thanks,
> > >> Huaxiang
> > >>
> > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <h...@cloudera.com
> <mailto:
> > h...@cloudera.com>> wrote:
> > >>>
> > >>> Hi Tim,
> > >>>
> > >>>   I believe that it runs into an issue which is specific to
> > >>> cloudera
> > >> release we fixed recently. For details, could you discuss it in cdh
> > forum?
> > >>> Copy me(h...@cloudera.com <mailto:h...@cloudera.com> <mailto:
> > h...@cloudera.com <mailto:h...@cloudera.com>>) in the forum so I
> > >> can explain more there.
> > >>>
> > >>>   Thanks,
> > >>>   Huaxiang
> > >>>
> > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhih...@gmail.com <mailto:
> > yuzhih...@gmail.com> <mailto:
> > >> yuzhih...@gmail.com <mailto:yuzhih...@gmail.com>>> wrote:
> > >>>>
> > >>>> Have you looked at HBASE-16578 ?
> > >>>>
> > >>>> Cheers
> > >>>>
> > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > timrobertson...@gmail.com <mailto:timrobertson...@gmail.com>
> > >> <mailto:timrobertson...@gmail.com
> > >> <mailto:timrobertson...@gmail.com>>>
> > wrote:
> > >>>>>
> > >>>>> Hi devs,
> > >>>>> [Had a quick chat with Lars G. about this and before opening a
> > >>>>> Jira I thought I'd raise it here first]
> > >>>>>
> > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > >>>>>
> > >>>>> Before I dig into this further, I'd like to just ask if anyone
> > >>>>> has
> > seen
> > >>>>> this before?
> > >>>>>
> > >>>>> The initial state was a table (tim_test) built with MOB support
> > >>>>> and a
> > >> few
> > >>>>> 10's million rows and 10's billions of cells.
> > >>>>>
> > >>>>> I wanted to rename the table to get this into production and did
> > >>>>> so
> > as
> > >>>>> follows:
> > >>>>>
> > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > >>>>>
> > >>>>> At this stage the application all looked good, and so I
> > >>>>> continued
> > with:
> > >>>>>
> > >>>>> delete_snapshot 'tim_test-snapshot'
> > >>>>> disable 'tim_test'
> > >>>>> drop ‘tim_test’
> > >>>>>
> > >>>>> Then things went... awry and data just started dropping out in
> > >>>>> the
> > app.
> > >>>>> Before long, all MOB data seemingly is gone.
> > >>>>>
> > >>>>> The references in the new table MOB folder appear to point to
> > >>>>> the
> > >> source
> > >>>>> table (e.g.
> > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bdfeed
> > >>>>> 2f5f
> > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > 8ae6318dfba2).
> > >>>>>
> > >>>>> The RS logs full of ERROR like:
> > >>>>>
> > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > >> regionserver.HStore:
> > >>>>> The mob file
> > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > >> bfa2ddd66b48
> > >>>>> could not be found in the locations
> > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > >> <hdfs://ha-nn/hbase/mobdir/
> > <hdfs://ha-nn/hbase/mobdir/>
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326>,
> > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > <hdfs://ha-nn/hbase/archive/
> > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]>
> > >>>>>
> > >>>>> What I don't know is:
> > >>>>> 1) was this running a background task to copy the MOB data when
> > >>>>> the snapshot was cloned and I just deleted the source before the
> > >>>>> copy was complete?
> > >>>>> - or
> > >>>>> 2) when running "snapshot and clone" it just references the
> > >>>>> source
> > MOB
> > >>>>> data until a (?) change?
> > >>>>> 3) snapshot and clone just doesn't support MOB?
> > >>>>>
> > >>>>> Can anyone shed some light on this easily before I dig into it
> > please?
> > >>>>>
> > >>>>> While this situation exists (at least in 1.0.0) might it be good
> > >>>>> to
> > get
> > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Tim
> >
> >
>

Reply via email to