Re: What is an efficient way to get all blobs / trees that have notes attached?
On Mon, Apr 4, 2016 at 9:46 AM, Sebastian Schuberthwrote: > On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland wrote: >>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an >>> object's hash is conatined in our table to get its notes. >>> >>> In particular 3) could be expensive for repos with a lot of files as we're >>> looking at all of them just to see whether they have notes attached. >> >> In (3), why would you need to search through _all_ blobs/trees? Would >> it not be cheaper to simply query the object type of each annotated >> object from (2)? I.e. something like: >> >> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-) >> do >> echo "--- $notes_ref ---" >> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-) >> do >> type=$(git cat-file -t "$annotated_obj") >> if test "$type" != "commit" >> then >> echo "$annotated_obj: $type" >> fi >> done >> done > > Thanks for the idea. The problem is that I do want to list the notes > by path of the object they belong to. As a blob could potentially > belong to more than one path (copies of files in the repo), I do not > see another way of getting that information other than iterating over > all blobs and checking what path(s) they belong to. True; fundamentally what you want is a blob/tree ID -> path(s) mapping, which is an independent problem, unrelated to to the initial notes lookup. I don't know of a solution faster than the brute-force search you already sketched. If this lookup is important to your use case, you could consider building/caching the required mapping when the notes are added in the first place, but I don't know if that is possible in your scenario... ...Johan > -- > Sebastian Schuberth -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is an efficient way to get all blobs / trees that have notes attached?
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herlandwrote: >> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an >> object's hash is conatined in our table to get its notes. >> >> In particular 3) could be expensive for repos with a lot of files as we're >> looking at all of them just to see whether they have notes attached. > > In (3), why would you need to search through _all_ blobs/trees? Would > it not be cheaper to simply query the object type of each annotated > object from (2)? I.e. something like: > > for notes_ref in $(git for-each-ref refs/notes | cut -c 49-) > do > echo "--- $notes_ref ---" > for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-) > do > type=$(git cat-file -t "$annotated_obj") > if test "$type" != "commit" > then > echo "$annotated_obj: $type" > fi > done > done Thanks for the idea. The problem is that I do want to list the notes by path of the object they belong to. As a blob could potentially belong to more than one path (copies of files in the repo), I do not see another way of getting that information other than iterating over all blobs and checking what path(s) they belong to. -- Sebastian Schuberth -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is an efficient way to get all blobs / trees that have notes attached?
On Fri, Apr 1, 2016 at 12:51 PM, Sebastian Schuberthwrote: > Hi, > > I'm curious whether there's a more efficient way to get a list of blobs / > trees (and their names) that have notes attached than doing this: > > 1) Get all notes refs I'm interested in (git-for-each-ref). > > 2) For each notes ref, get the list of notes (git-notes list) and store them > in a hash table that maps object hashes to notes. > > 3) Recursively list all blobs / trees (git-ls-tree) and look whether an > object's hash is conatined in our table to get its notes. > > In particular 3) could be expensive for repos with a lot of files as we're > looking at all of them just to see whether they have notes attached. In (3), why would you need to search through _all_ blobs/trees? Would it not be cheaper to simply query the object type of each annotated object from (2)? I.e. something like: for notes_ref in $(git for-each-ref refs/notes | cut -c 49-) do echo "--- $notes_ref ---" for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-) do type=$(git cat-file -t "$annotated_obj") if test "$type" != "commit" then echo "$annotated_obj: $type" fi done done Can probably be made even faster by using the --batch option to cat-file... ...Johan -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is an efficient way to get all blobs / trees that have notes attached?
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herlandwrote: > for notes_ref in $(git for-each-ref refs/notes | cut -c 49-) > do > echo "--- $notes_ref ---" > for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-) > do > type=$(git cat-file -t "$annotated_obj") > if test "$type" != "commit" > then > echo "$annotated_obj: $type" > fi > done > done > > Can probably be made even faster by using the --batch option to cat-file... For example: for notes_ref in $(git for-each-ref refs/notes | cut -c 49-) do echo "--- $notes_ref ---" git notes --ref=$notes_ref list | cut -c 42- | git cat-file --batch-check="%(objecttype) %(objectname)" | grep '^\(\(blob\)\|\(tree\)\) ' done ...Johan -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What is an efficient way to get all blobs / trees that have notes attached?
Hi, I'm curious whether there's a more efficient way to get a list of blobs / trees (and their names) that have notes attached than doing this: 1) Get all notes refs I'm interested in (git-for-each-ref). 2) For each notes ref, get the list of notes (git-notes list) and store them in a hash table that maps object hashes to notes. 3) Recursively list all blobs / trees (git-ls-tree) and look whether an object's hash is conatined in our table to get its notes. In particular 3) could be expensive for repos with a lot of files as we're looking at all of them just to see whether they have notes attached. Regards, Sebastian -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html