Re: What is an efficient way to get all blobs / trees that have notes attached?

2016-04-04 Thread Johan Herland
On Mon, Apr 4, 2016 at 9:46 AM, Sebastian Schuberth
 wrote:
> On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland  wrote:
>>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>>> object's hash is conatined in our table to get its notes.
>>>
>>> In particular 3) could be expensive for repos with a lot of files as we're
>>> looking at all of them just to see whether they have notes attached.
>>
>> In (3), why would you need to search through _all_ blobs/trees? Would
>> it not be cheaper to simply query the object type of each annotated
>> object from (2)? I.e. something like:
>>
>> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
>> do
>> echo "--- $notes_ref ---"
>> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
>> do
>> type=$(git cat-file -t "$annotated_obj")
>> if test "$type" != "commit"
>> then
>> echo "$annotated_obj: $type"
>> fi
>> done
>> done
>
> Thanks for the idea. The problem is that I do want to list the notes
> by path of the object they belong to. As a blob could potentially
> belong to more than one path (copies of files in the repo), I do not
> see another way of getting that information other than iterating over
> all blobs and checking what path(s) they belong to.

True; fundamentally what you want is a blob/tree ID -> path(s) mapping,
which is an independent problem, unrelated to to the initial notes lookup.

I don't know of a solution faster than the brute-force search you already
sketched. If this lookup is important to your use case, you could consider
building/caching the required mapping when the notes are added in the
first place, but I don't know if that is possible in your scenario...


...Johan

> --
> Sebastian Schuberth

-- 
Johan Herland, 
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is an efficient way to get all blobs / trees that have notes attached?

2016-04-04 Thread Sebastian Schuberth
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland  wrote:

>> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
>> object's hash is conatined in our table to get its notes.
>>
>> In particular 3) could be expensive for repos with a lot of files as we're
>> looking at all of them just to see whether they have notes attached.
>
> In (3), why would you need to search through _all_ blobs/trees? Would
> it not be cheaper to simply query the object type of each annotated
> object from (2)? I.e. something like:
>
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
> echo "--- $notes_ref ---"
> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
> do
> type=$(git cat-file -t "$annotated_obj")
> if test "$type" != "commit"
> then
> echo "$annotated_obj: $type"
> fi
> done
> done

Thanks for the idea. The problem is that I do want to list the notes
by path of the object they belong to. As a blob could potentially
belong to more than one path (copies of files in the repo), I do not
see another way of getting that information other than iterating over
all blobs and checking what path(s) they belong to.

-- 
Sebastian Schuberth
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is an efficient way to get all blobs / trees that have notes attached?

2016-04-01 Thread Johan Herland
On Fri, Apr 1, 2016 at 12:51 PM, Sebastian Schuberth
 wrote:
> Hi,
>
> I'm curious whether there's a more efficient way to get a list of blobs /
> trees (and their names) that have notes attached than doing this:
>
> 1) Get all notes refs I'm interested in (git-for-each-ref).
>
> 2) For each notes ref, get the list of notes (git-notes list) and store them
> in a hash table that maps object hashes to notes.
>
> 3) Recursively list all blobs / trees (git-ls-tree) and look whether an
> object's hash is conatined in our table to get its notes.
>
> In particular 3) could be expensive for repos with a lot of files as we're
> looking at all of them just to see whether they have notes attached.

In (3), why would you need to search through _all_ blobs/trees? Would
it not be cheaper to simply query the object type of each annotated
object from (2)? I.e. something like:

for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
echo "--- $notes_ref ---"
for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
do
type=$(git cat-file -t "$annotated_obj")
if test "$type" != "commit"
then
echo "$annotated_obj: $type"
fi
done
done

Can probably be made even faster by using the --batch option to cat-file...


...Johan

-- 
Johan Herland, 
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is an efficient way to get all blobs / trees that have notes attached?

2016-04-01 Thread Johan Herland
On Fri, Apr 1, 2016 at 2:16 PM, Johan Herland  wrote:
> for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
> do
> echo "--- $notes_ref ---"
> for annotated_obj in $(git notes --ref=$notes_ref list | cut -c 41-)
> do
> type=$(git cat-file -t "$annotated_obj")
> if test "$type" != "commit"
> then
> echo "$annotated_obj: $type"
> fi
> done
> done
>
> Can probably be made even faster by using the --batch option to cat-file...

For example:

for notes_ref in $(git for-each-ref refs/notes | cut -c 49-)
do
echo "--- $notes_ref ---"
git notes --ref=$notes_ref list | cut -c 42- | git cat-file
--batch-check="%(objecttype) %(objectname)" | grep
'^\(\(blob\)\|\(tree\)\) '
done


...Johan

-- 
Johan Herland, 
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What is an efficient way to get all blobs / trees that have notes attached?

2016-04-01 Thread Sebastian Schuberth

Hi,

I'm curious whether there's a more efficient way to get a list of blobs 
/ trees (and their names) that have notes attached than doing this:


1) Get all notes refs I'm interested in (git-for-each-ref).

2) For each notes ref, get the list of notes (git-notes list) and store 
them in a hash table that maps object hashes to notes.


3) Recursively list all blobs / trees (git-ls-tree) and look whether an 
object's hash is conatined in our table to get its notes.


In particular 3) could be expensive for repos with a lot of files as 
we're looking at all of them just to see whether they have notes attached.


Regards,
Sebastian



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html