liuwenjing17 commented on PR #5599:
URL: https://github.com/apache/hbase/pull/5599#issuecomment-1878527381
> > > I still do not fully understand the problem here...
> > > If we do not set millis to zero, it will only affect the life time of
a MOB file for less than 1 second, how could it make the MOB file expire 2
hours earlier?
> >
> >
> > Because in org.apache.hadoop.hbase.mob.MobUtils, the creation time of
mob files is obtained by parsing their names from fileName using the statement
(Date fileDate = parseDate(MobFileName.getDateFromName(fileName));). For
instance, data created on 20240105, their timestamps will be parsed as
1704384000000 (2024-01-05 00:00:00). In this way, when the master expired mob
thread starts, it may affect the life time of a MOB file for less than 1 day.
>
> Then the problem is we should use a timestamp instead of '20240105' in the
mob file name? I still do not understand why setting MILLISECOND to 0 can solve
the problem...
Here is an example:
1. Assume the Time-To-Live (TTL) for the mob data is set to 1 day.
2. We write mob data at 18:33 on 01/04/2023, and the data is flushed to a
mob file named xxxx20230104xxxx.
3. The mob expiration thread starts within 1 day, at 10:45 on 01/05/2023.
4. When checking ts, the standard expired timestamp is calculated as
(currentTS - 1day parsed by Calendar) : 1704297600720 (2024-01-04 00:00:00)(3
random digits when only set to SECOND level)
The mob file's ts, parsed from its name, is : 1704297600000 (2024-01-04
00:00:00)
5. if (fileDate.getTime() < expireDate.getTime()) {/* expired */}
if statement is true, indicating that the mob file has expired, and it
will be cleaned. **These 3 random digits cause the mob files to be cleaned
earlier than expected.**
6. But if we set to MILLISECOND level, the expireDate.getTime() will be
1704297600000, and if statement will be false. In this case, the mob file will
be retained as intended.
And here is the link to jira:
https://issues.apache.org/jira/browse/HBASE-28287
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]