Hi,

I have been developing my git tool (based on the git internal API) that can 
find out all the commits that have changed a line for better authorship. 

The reason is for my binary code authorship research, I use machine 
learning to classify code authorship. To produce training data, I start 
with a source code repository with well-known author labels for each line 
and then compiling the project into binary. So, I am able to know the 
authorship for binary code and then apply some machine learning techniques. 

To get ground truth of authorship for each line, I start with git-blame. 
But later I find this is not sufficient because the last commit may only 
add comments or may only change a small part of the line, so that I 
shouldn't attribute the line of code to the last author. Of course, there 
must be some debates on who can be the representative of a line of code. So 
what I would like to do is find out all the commits that have ever changed 
a line, then I can try different approaches to summarize over all these 
commits to produce my final authorship label (or even tuple). 

I was wondering whether there have been similar debates over accurate 
authorship in this community before and whether there may be other people 
interested in this work. 

PS: I was trying to send an email to mailing list g...@vger.kernel.org. But 
it always replied with 
Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient 
domain. We recommend contacting the other email provider for further 
information about the cause of this error. The error that the other server 
returned was: 550 550 5.7.1 Content-Policy reject msg: The message contains 
HTML subpart, therefore we consider it SPAM or Outlook Virus.  TEXT/PLAIN 
is accepted.! BF:<U 0.500236>; S1753454Ab2IYRLL (state 17).

Thanks

Xiaozhu

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/4yuBELlc8lUJ.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.

Reply via email to