Re: [Bacula-devel] hardlink performance issues

Kern Sibbald Sat, 14 Jul 2012 17:49:48 -0700

Hello Josip,

I am posting this at the top to avoid nesting the email too much.


OK, you got my attention: 1. You are willing to code. 2. You are
working for a partner. 3. You pointed me right to the problem.

If the SQL queries are not the problem then there is no need to
store the LinkFI in each node, which would increase the node size
but eliminate the call to db_get_file_attributes_record().  So assuming
that the real problem is in the "linear" search through the tree looking
for the hard link, which is uniquely identified by LinkFI and JobId (by the
way LinkFI is the "pointer" I was talking about -- a somewhat loose use
of the word pointer).

About the only way I could see to speed this up would be to keep a hash
table of all the links that hashes on LinkFI and JobId and stores the node
address as the result.  It could be quite fast but also increases the memory
usage, which will already be high because of the total number of files in
the tree.

Bacula already has hash code, so it wouldn't be hard to add the hash
table as an item in the root of the tree (needs confirmation by looking
at the code).  This assumes that the node can know that it is a hard link.
I think it can know, but am not sure: if its link count in the stat packet
is greater than 1 and the LinkFI is zero, then it is a file that has other
files that point to its FileIndex -- i.e. those hard linked files all have
their LinkFI set to the FileIndex of the file actually backed up.
If that is not the case, then some new code will be needed
when the FD is processing a hard linked file.  Basically, if I remember
right, during the backup process when the FD sees a file that is
hard linked, it saves that file and then puts it into a hash list, and if
any other file is linked to a file that is already
in the hash list, then the file is not saved a second time, but written as
a hard link with LinkFI "pointing" to the file that was actually backed up,
so all such files must some how be flagged so that they could
be put into to the hash table during restore.

Another way of doing it that would not require memory would be to create
a temporary SQL table that is indexed on LinkFI and JobId with the value 
being
the node memory address in the tree.  That would push the work off to
SQL rather than Bacula's has routines.

Without too much thought, I probably would use a hash table (or something
similar) and if that ran into memory problems provide the SQL temp table as
an alternative.

I don't think this is quite what you had in mind with your 5 points 
shown below,
but I think it would work. Does that make any sense to you?

Best regards,
Kern

On 07/14/2012 04:10 PM, Josip Almasi wrote:
> Kern Sibbald wrote:
>> Hello,
>>
>> On 07/13/2012 08:05 AM, Josip Almasi wrote:
>>> Hi,
>>>
>>> collegues ran into performance issues during mark/unmark.
>>> We narrowed it down to hardlinks: mark and unmark both call 
>>> set_extract,
>>> which traverses entire directory tree for each hardlink.
>>>
>>> Let's have a look at hardlinks [1] on a casual RHEL 6 system:
>>>
>>> root@joe:~/tmp> find / ! -type d -links +1 -ls|sort -n>hardlinks.txt
>>> root@joe:~/tmp> wc -l hardlinks.txt
>>> 23298 hardlinks.txt
>>>
>>> Most of these are pyc & pyo, and yum things.
>>>
>>> On our test mail server, 392577 hardlinks in /var/spool/imap.
>>> It's how cyrus lmtpd works, and it's good.
>>> However, mark works for well over an hour.
>>> But on a production mail server - 1459466 hardlinks in imap folders.
>>>
>>> Turning off hardlinks in bacula is not a solution, and turning of
>>> hardlinks in cyrus is neither.
>>> Bottom line, we can't sell bacula for mail servers.
>>>
>>> So, what can we do about it?
>>>
>>> A rblist could be used for lookup instead of dir tree traversal:
>> The tree code was written a long time ago, and from what
>> I remember, it already uses a red black tree structure, so I
>> am not sure without looking at the code exactly what you want
>> to do differently.  The best I can tell, you want a separate hard-link
>> list.
>
> Sure, I'd keep a hardlink list, but I'd use rblist for log(n) complexity.
>
>>> 1) TREE_ROOT would contain hardlinks rblist member (tree.h)
>>> 2) A comparator function needs to be implemented (ua_tree.c?)
>>> It would compare items based on JobId and FileIndex.
>>> 3) insert_tree_handler needs to maintain hardlink rblist (ua_tree.c)
>>> Only first ones, marked with ok.
>>> This would build up hardlinks rblist along with directory tree.
>>> 4) set_extract would use that rblist for hardlink lookup (ua_tree.c)
>>> 5) What about bookkeeping?
>>> I see rblist internally allocates and frees, and free_tree would relase
>>> it all, right?
>>>
>>> Did I miss something?
>>
>> I am sorry, I cannot really say if that is the right thing to do.  I am
>> not even sure if you have nailed down the problem.  It seems to me
>> that Bacula keeps a "pointer" to the link, and the problem I have
>> previously seen is that for each file that is a hard link, Bacula must
>> do a SQL query to find the file -- when you start doing millions of
>> SQL queries, things can get slow unless you are a DBA and do the
>> appropriate DB tuning, which is not evident.
>
> In fact, I've done some measurement.
> First, enabled postgres query timing:
>
> log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements
>                                # and their durations, > 0 logs only
>                                # statements running at least this number
>                                # of milliseconds
>
> Queries show up in /var/lib/pgsql/data/pg_log along with their 
> duration in milliseconds.
> Duration was 0.1 - 0.5 ms - as fast as it gets.
> For each file, bacula uses two queries, plus one more for each dir.
>
> Total number of queries during mark() on our test box with 392577 
> hardlinks was
>
> [root@sandbox pg_log]# grep SELECT postgresql-Wed.log |wc -l
> 639067
>
> Total number of unique queries was
>
> [root@sandbox pg_log]# grep SELECT postgresql-Wed.log |awk 
> --field-separator='SELECT ' '!a[$2]++'|wc -l
> 377958
>
> meaning we got 69% queries too much.
>
> But bottleneck was not database, it was dird, spending some 90% CPU 
> for some 80 mins.
>
> So it led me to set_extract recursive tree traversal:
>
> static int set_extract(UAContext *ua, TREE_NODE *node, TREE_CTX *tree, 
> bool extract)
> ...
>          /*
>           * If we point to a hard linked file, traverse the tree to
>           * find that file, and mark it to be restored as well. It
>           * must have the Link we just obtained and the same JobId.
>           */
>          if (LinkFI) {
>             for (n=first_tree_node(tree->root); n; n=next_tree_node(n)) {
> ...
>
> In-memory lookup for first occurence of a specific hardlinked file.
>
> Seems like queries result from db_get_file_attributes_record calls 
> from within the same function.
> I'm not sure how I'd get duplicate queries there, but now it's of 
> secondary importance - above lines mean 300K dir tree traversals.
>
> As for pointer to the link, I've seen it only in filed, and used only 
> on solaris.
> Not sure what it means though.
> But sure I can't simply reuse it.
>
>> So, are you going to implement this and submit it as a contribution?
>
> That's the plan, assuming I'm on the right way.
>
>> If so, please let me know and
>>   I will look at the code a bit more when I get back from vacation
>> in August and tell you what I think.
>
> Please do.
> I'm hoping to use some vacation too:) So august is just fine for me.
>
>> Hard links have always been a problem
>> for Bacula
>
> Not only for bacula, i.e. I think tivoli simply ignores them, treats 
> them as different files.
> Bad tivoli bad - can't restore /var/spool/imap :>
>
>> -- and by the way, if you create too many of them you may not
>> be able to boot your system since during the boot process (non-Bacula)
>> from what I understand all the hard links found must be able to fit 
>> within
>> the available boot memory.
>
> I don't think this applies for non-boot partitions.
> Regardless, we do have zillions of hardlinks on working mail servers.
>
>> If not, are you asking us to implement something for free so that you 
>> can sell
>> and make money off of Bacula?  It might be possible for the 
>> Enterprise version,
>> but right now, I am pretty overloaded doing bug fixes and other 
>> implementations
>> for the community version.
>
> Oh I work for Bacula partner, Nimium. (in public I use my open source 
> mail address)
> I think we already made some money off bacula, and once we solve this, 
> we're about to make some more;)
> My boss can provide more details on that, I deal with the code.
>
> Regards...
>



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] hardlink performance issues

Reply via email to