Re: Gsoc 18 Project | Collect build statistics

Rainer Müller Sat, 12 May 2018 09:21:39 -0700

On 2018-05-12 17:26, Vishnu wrote:
> So where can i see that code that checks the modification time?


https://github.com/macports/macports-base/blob/bb9db185d54c04efd55a3d634518a971bbb530a7/src/port/portindex.tcl#L120-L141

> "The portindex2postgres.sql script merely converts the PortIndex file
> from the custom format based on Tcl lists to SQL. The output will always
> contain SQL statements with the full data for every port."
> 
> Then after the port index is updated this script runs and flushes the db
> then fills it with new portindex data?
> or just updates the db with the modified port data?
> 
> Which one occurs?
> 
> Trying to understand the present mechanism.

https://github.com/macports/macports-infrastructure/blob/2129f0cd0eb80f207d2cc62542b65c197733ac51/jobs/portindex2postgres.tcl#L50-L57

> So after every commit the entire 15mb portindex being processed and data
> being uploaded to the db.
> I think this is very inefficient.
> rather it would be best if after every commit the buildbot/ webhook
> updates the database itself.

It does not matter whether the new information about the port appears on
macports-webapp 10 seconds or a full minute after the commit occurred.
Remember, it might even take at least as long as a full hour until the
updated Portfile will actually be available over rsync.

This is not a critical path. The data is compressed for the transfer, so
the 15 MB boil down to an upload size of only about 1 MB. Any algorithm
to figure out the differences here might likely even take longer than
just naively transferring the full data...

I do not see a pressing need for optimization here. The way it is done
at the moment is as robust as it can get to ensure the data is always
up-to-date.

If you really need to get the differences then that should be done in
comparison with the actual database. Doing it before would always make
assumptions about the state of the database instead of using the actual
data.

For example, when feeding the data into the database, make a temporary
in-memory copy of the old table first, then import the data, afterwards
select the differences by joining the old and new version of the tables.

Rainer

Re: Gsoc 18 Project | Collect build statistics

Reply via email to