Thomas Rast <> writes:

> Junio C Hamano <> writes:
>> Thomas Rast <> writes:
>>> I like the general idea, too, but I think there is a long way ahead, and
>>> we shouldn't hold up v5 on this.
>> We shouldn't rush, only to keep some deadline, and regret it later
>> that we butchered the index format without thinking things through.
>> When this was added to the GSoC idea page, I already said upfront
>> that this was way too big a topic to be a GSoC project, didn't I?
> Let me spell out my concern.  There are two v5s here:
> * The extent of the GSoC task.
> * The eventual implementation of index-v5 that goes into Git mainline.
> IMHO this thread is mixing up the two.  There indeed must not be any
> rush in the final implementation of index-v5.  However, the GSoC ends in
> less than two weeks, and I have to evaluate Thomas on whatever is
> finished until then.

This is the primary reason why I have recused myself from the Mentor
pool.  My involvement in this thread is mostly about the latter.  It
is not like "I do not really care about GSoC", but the maintainer
works for what is best for the project, not for GSoC schedule.

> AFAIK Thomas is now cleaning up the existing code to be in readable
> shape, using your feedback, which is great.  However, the above
> suggestion is such a fuzzily-specified task that there is no way to even
> find out what needs to be done within the next two weeks.

Yes, it is the mentor's job to (1) keep an eye on the progress of
the student, (2) avoid giving a task that is too big to chew within
the given timeframe, and (3) help the student learn the skill to
break down large tasks to manageable pieces.

> Perhaps it
> makes sense, at this point, to wrap anything that ended up having _v[25]
> suffixes in an index_ops like Duy did.

Yes, I think that suggestion was a welcome input for the mentor and
the student (item (3) above).

> That's a long way from actually
> following through on the idea, though.

I think that is perfectly fine, both from the point of view of the
project maintainer (who officially does not give a whit about GSoC
schedule) and from the point of view of somebody who cares about the
health of the development community (and as one part of it, cares
about the GSoC student project).

If Git GSoC admins initially picked a project that is too large by
mistake, finishing a subpart of it that is of reasonable size and
polishing the result into a nice shape would be the best the student
can do, and the grading should be done on the quality of that
subtask alone.  It may not directly help the project without the
remainder, but that is not the student's fault.  But as I am not
part of the Mentor pool, what I wrote in this paragraph is just my

> I think the part you snipped
>>> the loops that iterate over the index [...] either
>>> skip unmerged entries or specifically look for them.  There are subtle
>>> differences between the loops on many points: what do they do when they
>>> hit an unmerged entry?  Or a CE_REMOVED or CE_VALID one?
> is a symptom of the same general problem: the data structures are sound,
> but they are leaking all over the code and now we have lots of
> complexity to do even simple operations like "for each unmerged entry".

I do not think I was arguing against an updated cleaner API, so we
are in agreement.  In fact, I was saying that the calling code
should be ported to such a cleaner API and in-core data structure
first, and only then an optimal on-disk representation of the
in-core data structure can be designed.

The mistaken title of this GSoC topic was one of the root cause of
the issues, I think, you are seeing.  It said "faster file format",
but file format is a result of a design of the code that uses the
data, not the other way around.

That, and also the project scope is too large for a summer student
project as I said in the very beginning.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to