Re: [Bacula-devel] hardlink performance issues

Josip Almasi Sun, 15 Jul 2012 14:34:32 -0700

Thanks Kern,

that makes it rather clear, I'll go hashing.
Vacation first though, think I may do this in august.


Regards...


Kern Sibbald wrote:
> Hello Josip,
>
> OK for dropping the temp SQL table.  It is something I would
> do only if memory becomes an issue as it will someday, but
> right now it is not the main limiting factor.
>
> Either the rb tree or the hash list will work.  In my opinion, each
> will consume approximately the same space.  The hash list has
> table space that is used, but links are single links.  The rb tree has
> no table, but uses double links.
>
> I think you are missing one critical point: you must match both the
> FileIndex and the JobId.  The FileIndex is four bytes the JobId is also
> four bytes.  So, the FileIndex and the JobId could be put into one
> of our 64 bit hash tables, but they could also be composed as a
> string.  For the rb tree, you will need to compose the FileIndex and
> JobId into a string.  In both cases, the payload is a void *, which is
> sufficient to handle a node pointer.
>
> The coding of the two is essential identical, with using a composed
> string being just slightly more complicated (a couple of sprintfs for
> the composition).
>
> It is true that a node does not currently know that it is a hard linked
> file, but I explained last time how to tell it. There are still 2 bits left
> to add two new booleans, so making it aware will not add any additional
> memory -- the only hard part (joking) is to figure out how to distinguish
> the current hard_link flag from the new one you must create.
>
> Best regards,
> Kern
>
>
> On 07/15/2012 06:00 AM, Josip Almasi wrote:
>> Hello Kern,
>>
>> (avoiding nesting mail but keeping it all so I can forward to collegues)
>> yep, it all makes sense.
>>
>> A temp table indexed on JobId and FileIndex was first thing I thought of.
>> Then again, it seems like more work than in-memory table:
>> - DB's have different temp table syntax
>> - creating and releasing temp table
>> - more source files to patch
>> - can't beat in-memory solution.
>> So if I can make it in-memory with patching only two files, I'd go for it.
>>
>> Hash table would sure be faster solution than rbtree.
>> But how about memory consumption?
>> I think rblist would use only 4 bytes for each hardlink pointer, so 4MB for 
>> a million hardlinks. (right?)
>> As for complexity, log(n) is waaay better than current dir traversal.
>> And, 1M * log(1M) may be just fine - a million nanoseconds, who cares.
>>
>> As far as I've seen in the code, a node does know it's a hardlink - it's 
>> encoded in the stat, as you described.
>> Just keep in mind, I'm not talking about FD, I'm talking about stored, and 
>> mark/unmark lasting 80 mins.
>> See, it hurts even more than slow backup/restore, we can't even show up a 
>> presentation;)
>> AFAIK backup/restore works fine, at least my collegues have no objections.
>> (they may, once we fix mark/umark; all in due time)
>>
>> So, I think my 5 points need search rblist/replace hashtable.
>> Well, maybe. It's your decission, resource usage & performance estimate.
>> I'd prefer rblist, and if mark works for 30 secs or so, leave it be.
>> If mark lasts for 5 mins, I'd switch to hastable, since production servers 
>> have 5 times more hardlinks than our test.
>>
>> Regards...
>>
>> Kern Sibbald wrote:
>>> Hello Josip,
>>>
>>> I am posting this at the top to avoid nesting the email too much.
>>>
>>> OK, you got my attention: 1. You are willing to code. 2. You are
>>> working for a partner. 3. You pointed me right to the problem.
>>>
>>> If the SQL queries are not the problem then there is no need to
>>> store the LinkFI in each node, which would increase the node size
>>> but eliminate the call to db_get_file_attributes_record().  So assuming
>>> that the real problem is in the "linear" search through the tree looking
>>> for the hard link, which is uniquely identified by LinkFI and JobId (by the
>>> way LinkFI is the "pointer" I was talking about -- a somewhat loose use
>>> of the word pointer).
>>>
>>> About the only way I could see to speed this up would be to keep a hash
>>> table of all the links that hashes on LinkFI and JobId and stores the node
>>> address as the result.  It could be quite fast but also increases the memory
>>> usage, which will already be high because of the total number of files in
>>> the tree.
>>>
>>> Bacula already has hash code, so it wouldn't be hard to add the hash
>>> table as an item in the root of the tree (needs confirmation by looking
>>> at the code).  This assumes that the node can know that it is a hard link.
>>> I think it can know, but am not sure: if its link count in the stat packet
>>> is greater than 1 and the LinkFI is zero, then it is a file that has other
>>> files that point to its FileIndex -- i.e. those hard linked files all have
>>> their LinkFI set to the FileIndex of the file actually backed up.
>>> If that is not the case, then some new code will be needed
>>> when the FD is processing a hard linked file.  Basically, if I remember
>>> right, during the backup process when the FD sees a file that is
>>> hard linked, it saves that file and then puts it into a hash list, and if
>>> any other file is linked to a file that is already
>>> in the hash list, then the file is not saved a second time, but written as
>>> a hard link with LinkFI "pointing" to the file that was actually backed up,
>>> so all such files must some how be flagged so that they could
>>> be put into to the hash table during restore.
>>>
>>> Another way of doing it that would not require memory would be to create
>>> a temporary SQL table that is indexed on LinkFI and JobId with the value 
>>> being
>>> the node memory address in the tree.  That would push the work off to
>>> SQL rather than Bacula's has routines.
>>>
>>> Without too much thought, I probably would use a hash table (or something
>>> similar) and if that ran into memory problems provide the SQL temp table as
>>> an alternative.
>>>
>>> I don't think this is quite what you had in mind with your 5 points shown 
>>> below,
>>> but I think it would work. Does that make any sense to you?
>>>
>>> Best regards,
>>> Kern
>>>
>>> On 07/14/2012 04:10 PM, Josip Almasi wrote:
>>>> Kern Sibbald wrote:
>>>>> Hello,
>>>>>
>>>>> On 07/13/2012 08:05 AM, Josip Almasi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> collegues ran into performance issues during mark/unmark.
>>>>>> We narrowed it down to hardlinks: mark and unmark both call set_extract,
>>>>>> which traverses entire directory tree for each hardlink.
>>>>>>
>>>>>> Let's have a look at hardlinks [1] on a casual RHEL 6 system:
>>>>>>
>>>>>> root@joe:~/tmp> find / ! -type d -links +1 -ls|sort -n>hardlinks.txt
>>>>>> root@joe:~/tmp> wc -l hardlinks.txt
>>>>>> 23298 hardlinks.txt
>>>>>>
>>>>>> Most of these are pyc & pyo, and yum things.
>>>>>>
>>>>>> On our test mail server, 392577 hardlinks in /var/spool/imap.
>>>>>> It's how cyrus lmtpd works, and it's good.
>>>>>> However, mark works for well over an hour.
>>>>>> But on a production mail server - 1459466 hardlinks in imap folders.
>>>>>>
>>>>>> Turning off hardlinks in bacula is not a solution, and turning of
>>>>>> hardlinks in cyrus is neither.
>>>>>> Bottom line, we can't sell bacula for mail servers.
>>>>>>
>>>>>> So, what can we do about it?
>>>>>>
>>>>>> A rblist could be used for lookup instead of dir tree traversal:
>>>>> The tree code was written a long time ago, and from what
>>>>> I remember, it already uses a red black tree structure, so I
>>>>> am not sure without looking at the code exactly what you want
>>>>> to do differently.  The best I can tell, you want a separate hard-link
>>>>> list.
>>>>
>>>> Sure, I'd keep a hardlink list, but I'd use rblist for log(n) complexity.
>>>>
>>>>>> 1) TREE_ROOT would contain hardlinks rblist member (tree.h)
>>>>>> 2) A comparator function needs to be implemented (ua_tree.c?)
>>>>>> It would compare items based on JobId and FileIndex.
>>>>>> 3) insert_tree_handler needs to maintain hardlink rblist (ua_tree.c)
>>>>>> Only first ones, marked with ok.
>>>>>> This would build up hardlinks rblist along with directory tree.
>>>>>> 4) set_extract would use that rblist for hardlink lookup (ua_tree.c)
>>>>>> 5) What about bookkeeping?
>>>>>> I see rblist internally allocates and frees, and free_tree would relase
>>>>>> it all, right?
>>>>>>
>>>>>> Did I miss something?
>>>>>
>>>>> I am sorry, I cannot really say if that is the right thing to do. I am
>>>>> not even sure if you have nailed down the problem.  It seems to me
>>>>> that Bacula keeps a "pointer" to the link, and the problem I have
>>>>> previously seen is that for each file that is a hard link, Bacula must
>>>>> do a SQL query to find the file -- when you start doing millions of
>>>>> SQL queries, things can get slow unless you are a DBA and do the
>>>>> appropriate DB tuning, which is not evident.
>>>>
>>>> In fact, I've done some measurement.
>>>> First, enabled postgres query timing:
>>>>
>>>> log_min_duration_statement = 0 # -1 is disabled, 0 logs all statements
>>>>                                # and their durations, > 0 logs only
>>>>                                # statements running at least this number
>>>>                                # of milliseconds
>>>>
>>>> Queries show up in /var/lib/pgsql/data/pg_log along with their duration in 
>>>> milliseconds.
>>>> Duration was 0.1 - 0.5 ms - as fast as it gets.
>>>> For each file, bacula uses two queries, plus one more for each dir.
>>>>
>>>> Total number of queries during mark() on our test box with 392577 
>>>> hardlinks was
>>>>
>>>> [root@sandbox pg_log]# grep SELECT postgresql-Wed.log |wc -l
>>>> 639067
>>>>
>>>> Total number of unique queries was
>>>>
>>>> [root@sandbox pg_log]# grep SELECT postgresql-Wed.log |awk 
>>>> --field-separator='SELECT ' '!a[$2]++'|wc -l
>>>> 377958
>>>>
>>>> meaning we got 69% queries too much.
>>>>
>>>> But bottleneck was not database, it was dird, spending some 90% CPU for 
>>>> some 80 mins.
>>>>
>>>> So it led me to set_extract recursive tree traversal:
>>>>
>>>> static int set_extract(UAContext *ua, TREE_NODE *node, TREE_CTX *tree, 
>>>> bool extract)
>>>> ...
>>>>          /*
>>>>           * If we point to a hard linked file, traverse the tree to
>>>>           * find that file, and mark it to be restored as well. It
>>>>           * must have the Link we just obtained and the same JobId.
>>>>           */
>>>>          if (LinkFI) {
>>>>             for (n=first_tree_node(tree->root); n; n=next_tree_node(n)) {
>>>> ...
>>>>
>>>> In-memory lookup for first occurence of a specific hardlinked file.
>>>>
>>>> Seems like queries result from db_get_file_attributes_record calls from 
>>>> within the same function.
>>>> I'm not sure how I'd get duplicate queries there, but now it's of 
>>>> secondary importance - above lines mean 300K dir tree traversals.
>>>>
>>>> As for pointer to the link, I've seen it only in filed, and used only on 
>>>> solaris.
>>>> Not sure what it means though.
>>>> But sure I can't simply reuse it.
>>>>
>>>>> So, are you going to implement this and submit it as a contribution?
>>>>
>>>> That's the plan, assuming I'm on the right way.
>>>>
>>>>> If so, please let me know and
>>>>>   I will look at the code a bit more when I get back from vacation
>>>>> in August and tell you what I think.
>>>>
>>>> Please do.
>>>> I'm hoping to use some vacation too:) So august is just fine for me.
>>>>
>>>>> Hard links have always been a problem
>>>>> for Bacula
>>>>
>>>> Not only for bacula, i.e. I think tivoli simply ignores them, treats them 
>>>> as different files.
>>>> Bad tivoli bad - can't restore /var/spool/imap :>
>>>>
>>>>> -- and by the way, if you create too many of them you may not
>>>>> be able to boot your system since during the boot process (non-Bacula)
>>>>> from what I understand all the hard links found must be able to fit within
>>>>> the available boot memory.
>>>>
>>>> I don't think this applies for non-boot partitions.
>>>> Regardless, we do have zillions of hardlinks on working mail servers.
>>>>
>>>>> If not, are you asking us to implement something for free so that you can 
>>>>> sell
>>>>> and make money off of Bacula?  It might be possible for the Enterprise 
>>>>> version,
>>>>> but right now, I am pretty overloaded doing bug fixes and other 
>>>>> implementations
>>>>> for the community version.
>>>>
>>>> Oh I work for Bacula partner, Nimium. (in public I use my open source mail 
>>>> address)
>>>> I think we already made some money off bacula, and once we solve this, 
>>>> we're about to make some more;)
>>>> My boss can provide more details on that, I deal with the code.
>>>>
>>>> Regards...
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] hardlink performance issues

Reply via email to