On August 18, 2014 10:53:12 PM EDT, Nick Krause <[email protected]> wrote:
>On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer
><[email protected]> wrote:
>> On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <[email protected]>
>wrote:
>>> What are the advantages of the hashed linked list version over the
>>> standard one and does it
>>> increase the memory usage and overhead of the linked list more if I
>>> use a hashed version?
>>
>> Seriously?  Do you know what a hash is?
>>
>> A hash is a well-defined many to one algorithm.
>>
>> If I have a universe of a million items that hash down to 100 unique
>> hashes, then I can group those million items by hash and have 100
>> groups of roughly 10,000 items each.
>>
>> The better the hashing algorithm versus my original universe of 1
>> million items, the more even the distribution.
>>
>> Now that I have 100 segregated groups I can build an array of 100
>> linked lists all maintained separately.
>>
>> Thus:
>>
>> hash_index = my_hash(item)
>>
>> add_item(linked_list[hash_item], item) is how I add my item to the
>> hashed linked list.
>>
>> is_in_list(linked_list[hash_item], item) is how I check to see if my
>> item is already in the list.
>>
>> So in my example I have to have 100 linked lists, but each list is on
>> average 100x smaller than a simple linked list would be.
>>
>> Is adding an item to the hashed linked list faster?
>>
>> Absolutely not, I have to hash the item first then do a normal linked
>> list insertion.  That will always be slower.
>>
>> Is finding the item faster?
>>
>> That is the whole point of the exercise.  The theory is you ONLY use
>a
>> hashed linked list if the overhead of hashing the item is less than
>> the amount of time saved by traversing shorter lists when you search.
>>
>> It is the job of the programmer to make the determination if a hashed
>> list is a better choice or not on a case by case basis.  It depends
>on
>> the length of the list without breaking it into pieces and how well
>> the hash algorithm can do at generating roughly similar segregated
>> groups.
>>
>> For the size question, write yourself a userspace app and test it.
>> Obviously that is more work than asking here, but it is ASSUMED you
>> are doing research on your OWN before you post questions here.
>>
>> fyi: this question has little to do with the linux kernel.  It is
>part
>> of what people mean when they say you need to go learn c before you
>> start on the kernel.  Using linked lists and hashed linked lists is
>> stuff you can fully explore in userspace.
>>
>> Greg
>No I known what the advantages are for user space was wondering if
>there were any issues that differ in
>kernel space.
>Nick

1) your original question needs either a highly generic answer like I gave, or 
a highly specific one that depends on the exact nature of the data, the number 
of the items tracked, the ratio of searches vs. adds, and how smooth the hash 
grouping is.  Since you didn't provide the exact use case, only the generic 
answer is possible.  In fact your question implies that the answer is 
relatively straight forward.  A much better question would have been "for a 
specific use case, how is the choice of a normal linked list vs a hashed linked 
list performed?"

Note the answer to that has nothing to do with user space vs kernel space.

2) The kernel is not a magic place.  Sure there are issues like locking and 
interrupts that make the kernel more complex than user space, but for data 
algorithms it is just that the quality of the code is pretty universally 
excellent.  It is excellent because it has been open for 20+ years and some 
great developers have worked on it during that time.  Poorly written code in 
any of the core areas was eradicated long ago.

You can take that excellent code into your user space app and test it to your 
heart's content.  Not only can you do that, for something like a linked list 
evaluation, you should do that.  You have implied "tested" code is code that 
compiles.  If a developer wanted to replace the hashed linked link 
implementation it would be expected that they had done significant testing of 
the new code in user space with highly varied loads to show what they work well 
on and when the new code performs less well.  Then do an analysis of the 
existing kernel data structures which use hashed linked lists and prove that 
the new method is an improvement for the actual kernel use cases.  It would be 
months of work, but that is what it takes if you actually want to improve the 
kernel in a meaningful way.

Greg


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

_______________________________________________
Kernelnewbies mailing list
[email protected]
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Reply via email to