Hi, I'm currently trying to understand the inner workings of git better by writing a git clone in Python. I find it rather hard to understand how to efficiently use trees.
What I understand is this: Trees are in essence blobs with a specific content. The content is a relation between other blobs names, types and modes. With these lines they can refer to other blobs and trees and thus make a tine filesystem. Now I constructed a system to read and write blobs and have an index file that potentially references to a top tree object, which represents the currently cached repository state. I can add and remove files from the index manually, by creating the blob of the file and working my way up adding or updating trees until I hit the one referenced in INDEX. My algorithm to automate it is really ugly, error prone and inefficient, though. I also have a hard time to find my way around in C files, so maybe some people here in the list could explain the algorithm in Git to me. suppose we have the following Index: INDEX -> tree -> tree "a/" -> blob "b.txt" -> tree "c/" -> blob "d.txt" Now you want to stage a file with the following reference "a/e/g.txt". One approach would be to walk top-down, splitting the path into its elements and looking for the corresponding trees, either retrieving an existing tree or creating a new one. Then finally create the blob "g.txt" and be done with it. This seems rather inefficient, though, because each created or updated tree means that all trees way back up need to be updated as well, once for every step in the loop. The other way is to go bottom-up, first creating the blob, then creating trees up to the project root folder. But then I don't see a way to find which tree elements already exist and need to be updated. So the only algorithm I can come up with is this: 1. walk down the tree with help of the path string to the tree that is closest to the file I want to store. On the way remember all the trees on the path from INDEX to the resulting file. (In the example above I'd like to get the hash of the "a/" tree) 2. create the blob (in the example with the context of g.txt) 3. create the trees bottom-up until one step before the tree found in 1. (in the example create a "e/" tree, containing the "g.txt"'s blob) 4. Add the resulting tree from 3. to the one found in 1. and create the updated hash 5. Now with help of the list from 1. walk the existing trees bottom-up and update each one with the new hashes until INDEX is hit. 6. Update INDEX. Alltogether the idea of trees looked really simple and recursive which makes me quite unhappy with the algorithm I came up with. What is the algorithm to stage single files in Git itself? How Also: I thought to myself, why not just make a function that returns the relative path to the repository root folder and consider that string the file name, drop the idea of trees at all and add the information that is traditionally stored in tree objects directly in a commit object. Wouldn't that be much simpler and still accomplish the same? I think the idea of keeping information in separate small files instead of single big files was dropped at one point anyway, when pack-files were introduced. Cheers Erik -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html