On Thu, Oct 12, 2017 at 06:18:19AM -0700, LnT wrote:

> My Requirement : Export all changed/added/Modified files Between 
> <From_Date>:<To_Date> to a separate folder without changing the parent 
> folder structure.
>                  These files will be given to SONAR Scan.
>      
> I have the Algorithm - Get updated/added/changed files Between 
> <From_Date>:<To_Date>
> 
> 1) Prepare log of changes between <From_Date>:<To_Date>
> 2) Identify all the file names from the log/history
> 3) Get pull/checkout entire repository
> 4) cherry pick all files from the list prepare in Step#2
> 5) Copy all cherry picked files to a separate location for scanning
> 
> But I wish to have more optimal solution to write my script - which will be 
> given to my Jenkins job which accepts <From_Date> && <To_Date>
> My setup : git version 2.12.2.windows.2

I see it can be simplified in several places.

First, there's no need to inspect the Git log: the `git diff` command is
already equipped with everything needed.

This command accepts a "--name-only" command-line option which tells the
command to only output the names of the files changed between the
specified revisions instead of the actual changes.

So, to get what was changed between the "from date" and the "to date",
you do

  $ git diff --name-only 'master@{2017-09-31}' 'master@{2017-10-12}'

which tells Git to compare the states of the repository as recorded on
the branch "master" as of 31 September, 2017 and 12 October, 2017.

Second, Git supports so-called "shallow clones": a shallow clone
contains only a minimal amount of history.  This is what usually needed
if you only need the state of files of a particular history snapshot.

Third, there's no need to cherry-pick anything.
May be that's merely just a poor wording, but I think I should warn you
just in case.  After you've cloned a repository, you have several ways
to obtain the files you need: merely copy/link them over from a work
tree (if that was a "normal" clone) or take them from the index or take
them directly from a commit of interest; the latter two approaches work
even for bare clones.

Now a word of caution.  The "--name-only" option only lists the files
which were "changed", and that also means deleted files and supposed
renames (Git does not track renames, so it tries to detect them when it
traverses the history).  This means if a file was renamed somewhere
between those two dates of interest, you can't really obtain it from the
commit at "to date".
Hence supposedly you instead need the "--name-status" option which
prefixes each file name it lists with the type of change done to that
file; these types are listed in the git-diff manual page but a quick
overview if that 'A', 'D' and 'M' stand for added, deleted and modified,
respectively, and 'Rnnn' stand for supposed rename with confidence nnn
percents (zero based, up to 100).  All states except 'M' are followed by
the name of the file, and are separated from it with a TAB character;
the 'M' state if followed by the old and the new names of the file, and
all these fields are separated with a TAB character.


Considering all of the above, I'd recommend this approach:

 1. Get the list of changed files using

    git diff --names-status 'branch@{from date}' 'branch@{to date}'

 2. Process it with a tool which would collect the names of the files
    in the states A and M, and the "new" names of the files in state R.

    You'd need to think through what to do with possible deletions.
    May be ignoring them is just OK. Or may be you'd need to fail or
    log a warning -- that depends on the nature of your data.

 3. Given that list of files, deliver them to your CI server.

 4. There, call
 
      git clone --bare --branch branch --single-branch URL

    That would clone just a minimum amount of history for the
    branch named "branch" and set the HEAD ref in the resulting clone
    to point to it.

 5. Iterate over the list of files and obtain them calling

      git cat-file blob HEAD:path/to/a/file >dest/dir/file

    You might need to first `mkdir -p dest/dir` for each file
        first to create a directory hierarchy.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to