On 2018-05-12 17:26, Vishnu wrote: > So where can i see that code that checks the modification time?
https://github.com/macports/macports-base/blob/bb9db185d54c04efd55a3d634518a971bbb530a7/src/port/portindex.tcl#L120-L141 > "The portindex2postgres.sql script merely converts the PortIndex file > from the custom format based on Tcl lists to SQL. The output will always > contain SQL statements with the full data for every port." > > Then after the port index is updated this script runs and flushes the db > then fills it with new portindex data? > or just updates the db with the modified port data? > > Which one occurs? > > Trying to understand the present mechanism. https://github.com/macports/macports-infrastructure/blob/2129f0cd0eb80f207d2cc62542b65c197733ac51/jobs/portindex2postgres.tcl#L50-L57 > So after every commit the entire 15mb portindex being processed and data > being uploaded to the db. > I think this is very inefficient. > rather it would be best if after every commit the buildbot/ webhook > updates the database itself. It does not matter whether the new information about the port appears on macports-webapp 10 seconds or a full minute after the commit occurred. Remember, it might even take at least as long as a full hour until the updated Portfile will actually be available over rsync. This is not a critical path. The data is compressed for the transfer, so the 15 MB boil down to an upload size of only about 1 MB. Any algorithm to figure out the differences here might likely even take longer than just naively transferring the full data... I do not see a pressing need for optimization here. The way it is done at the moment is as robust as it can get to ensure the data is always up-to-date. If you really need to get the differences then that should be done in comparison with the actual database. Doing it before would always make assumptions about the state of the database instead of using the actual data. For example, when feeding the data into the database, make a temporary in-memory copy of the old table first, then import the data, afterwards select the differences by joining the old and new version of the tables. Rainer
